Stanford Graduate School of Education MS in Education Data Science Makes EDDR Required Reading
Friday, Aug 12, 2022 by Wendy Geller
When Professor Sanne Smith, Program Director for the MS in Education Data Science at Stanford’s Graduate School of Education, reached out to us to see if we could come (virtually) talk with her seminar class who had just read EDDR Volume I, Jared, DJ, and I weren’t just thrilled, we were elated!
We were (and are!) so happy because it was a sign that EDDR was getting a chance to help the next cohort of education data analysts get a jump on the realities of this field so their important work ahead might not hit quite as many stumbling blocks as ours once did.
If EDDR helps just one of those students avoid a pitfall we’ve weathered, who knows how much faster they might accomplish powerful analysis in support of public education? That’s been the whole point of this project since we started, to help others in the field do the great work we know and have seen can be done (BTW, EDDR Vol II has some awesome examples of just that kind of thing from experts throughout the country).
So, needless to say, once we peeled ourselves off of our respective ceilings, we shouted a big “We’re in!” and Professor Smith set up a video call with her class last October. DJ had a prior commitment, so couldn’t make it, but Jared and I held the fort while she was away.
What an absolute gift that autumn day was with Professor Smith’s students. They asked some truly thoughtful questions and the conversation we had was both lively and frank (my favorite kind!). Below I’ve included some of the queries Professor Smith’s students posed to us and I’ve summarized the gist of our responses to those excellent prompts:
1. How, if at all, has their work/methods with education data changed as technology and data science methods have developed? (i.e., are there processes/tools/programs they find themselves using more now to do analysis, problem-solving etc. that were not as relevant at the start of their careers?)
Wendy: Holy smokes, yes! When I started, because my PhD program trained us with SPSS, that was the tool I was most comfortable using. Today, while I don’t get to do the analyses myself as much anymore because my role has changed, my Division works with the Microsoft stack, Python, Jupyter Hub and Notebooks, and R. One thing to be mindful of in this field, is that the tool sets you’ll encounter in your job are largely dictated by the existing technical or IT portfolio, so you may not have your pick. This is one of the reasons I would recommend some solid work in the industry standard stacks like MS, R, and Python. There isn’t an organization out there that wouldn’t value an analyst coming in with experience in at least one of those ecosystems.
2. I love the collaborative tone of the project, both for its creation and the suggestions it gives for work-place practices. Doing data in the education world can sometimes feel isolating when you are one of the only, if not the only, specialist in your discipline at your organization. It'd be awesome if there were a way to somehow implement the community that you authors have found on a larger scale so that all us education data analysts/scientists across the country could ideate and problem-solve together.
Wendy: We couldn’t agree more! We hope very much that you’ll stay engaged with EDDR because that’s exactly what we’re hoping to build. Please stay in touch!
3. It would be cool to include a section with sources of highly-consulted, good quality education data sets. Oftentimes when we are doing district-level work, we need to merge in or consult data from external sources (like nationally representative studies, federal or state-level datasets, etc.). I think education data scientists would benefit from having a centralized list of ones that are particularly good/complete with a wide range of useful variables.
Wendy: Absolutely! I highly recommend the EDFacts data as a solid, large data set. They come from LEAs through SEAs across the US and are submitted annually. There’s a lot of documentation about what these data are, how they’re submitted, and what they can be used for.
DJ: This is a great question and a common request we get from state and district agency data folks, which is why we’re planning this chapter for our third volume.
For that chapter, I draw on my detailed metadata of the data sources I've been cultivating in my business (consulting to education agencies and stakeholder groups). I document not just the different sources available, but also all that's important to account for in each of the datasets (from each source, and across each year). The chapter will document this curated inventory of relevant data collections, with vivid examples of some of the most important caveats I’ve found.
Common questions answered in this chapter are some versions of the following:
● What do other states and districts report as public aggregates?
● What does USED report—through CCD, EDFacts, CRDC, EDGE, etc?
● How do these data—especially those reported by USED—differ from similar public aggregates reported by our state and others: in terms of business rules, disaggregation (by student subgroups, grades, etc.), timing, and suppression methods?
4. What are some of your favorite resources for ongoing professional development (e.g., other books, conferences, online resources, etc.)?
Wendy: I LOVE this question. My Division makes a lot of use of DataCamp as well as good ol’ YouTube. We use GitHub and there are many, many supportive best practices documents available via the National Center for Education Statistics Data Forum Best Practices publications. We also attend the STATS-DC conference annually, along with other Institute of Education Sciences offerings.
5. I liked a lot of the practical tips that you wouldn't learn about before entering the workforce (either through other books or school). There are things I learned only through working (e.g., it's always good to build relationships with the IT department), that would have been great to learn before working.
Wendy: Thank you! That’s exactly what we were hoping to share! If anything that tripped us up can help you avoid our same mistakes, that’s a huge win. Our motto is “a win for one of us is a win for all of us”. We hope you’ll consider becoming a contributor to the EDDR project so others can learn from the valuable lessons that you could teach them about work in this field.
6. The authors touch upon ethics in Chapter 5, but say there is "far more to discuss about ethics in data analytics than we have room in this chapter to review." Ethics are so important in education data analysis; I'd love to see them build out an entire chapter dedicated to ethics. I would also like to hear from my racially diverse contributors as well, just in case their experience working in organizations is different from the authors' experiences.
Wendy: What great ideas. We’re planning to cover some topics around ethics in Volume III, which will have a focus on sustainability, and we added three new voices to Volume II to try to broaden out the perspectives offered. There’s much more to do when it comes to inclusion, so please consider this an open invitation to all to join in this work with us. All voices are important here. Our work is better when we have many perspectives contributing.
DJ: And here’s an update of our efforts on these really important fronts. The second volume we finished shortly after this talk, includes these relevant chapters from two new authors. In the Transparency chapter, Ellis Ott (Fairbanks North Star Borough) shares some very frank lessons learned on advocating for truths. And see LaCole Foots’ (Texas Education Agency) chapter on “the importance of self-reflection and awareness of how our identities shape our approach to our work.”
7. Can you provide one example when you navigated the "politics" within the organization and how you managed to deal with it and what recommendations would you give to a prospective data scientist/analyst on this topic?
Wendy: Absolutely. I think Jared is much better at this than I am, but when I’ve successfully navigated a political situation, the key thing I did was outline for others how the thing I was trying to get done would help them too. In my case, I needed more support to continue some core steps in our Enterprise Data Environment work and that support needed to come from both my agency and our sister, technical agency.
So, I wrote a memo that outlined all of the work that our cross-functional teams had done together, the automation and time savings we had achieved, and I translated that into dollar savings. Then, I took the step of showing how soon the work would “pay for itself” so I could highlight what a good investment the effort was. I also was careful to showcase how the work was a joint effort between the two agencies. This document went to the Governor eventually, and helped us get the support we needed to continue the work.
8. I like you put a lot of emphasis on descriptive work. People often tend to chase fancy buzzwords such as machine learning/deep learning but forget how invaluable some "basic" statistics are. This is crucial for every practitioner to keep in mind.
Wendy: Definitely. It’s all too often the case that people get excited about a fancy new method or cool advance in technology, but they forget that data and analyses are only as powerful as your ability to make the information they provide accessible to people who need and can use it to inform their work. Now, don’t get me wrong, I’m excited about learning new ways of doing things as well as how we can find important insights by applying nuanced models, but if we can’t make our findings understandable to wide audiences, we’re missing enormous opportunities to empower important work. It shouldn’t be about how difficult what you’re doing is, but about how many people it helps.
We want to thank Professor Smith for inviting us and all of you in the seminar for your terrific questions and ideas!