Attention to historically underserved student groups has long been a focus of federal accountability. In 2022, for the first time in three years, states had to identify low-performing schools, including those where particular subgroups of students are not meeting standards. Secretary Cardona, in an address in January of this year, framed the requirement as an opportunity to rethink accountability. He said:
"We need to recognize once and for all that standardized tests work best when they serve as a flashlight on what works and what needs our attention—not as hammers to drive the outcomes we want in education from the top down, often pointing fingers to those with greater needs and less resources."
In other words, accountability shouldn't be punitive—it should be illuminating, helping us understand where and how we need to improve. But our flashlight doesn't shine a light on everything we need it to. As states across the country know from experience, measuring the performance of subgroups within schools is particularly challenging, especially when the subgroups have small numbers of students. Subgroup performance measures are essential—they're used to identify schools for targeted support and to inform efforts to improve equity. But performance measures for individual subgroups within schools can change due to random luck, bouncing wildly from year to year, making them unstable and unreliable. Fortunately, this challenge has a solution.
It's possible to stabilize subgroup performance measures to reconcile both accuracy and equity. A blog post from REL Mid-Atlantic offers a closer look at stabilizing performance measures using a statistical technique called Bayesian hierarchical modeling. You can also view a new infographic, designed with the broader education community in mind, for a high-level, visual introduction to Bayesian stabilization.