The Every Student Succeeds Act requires that states identify low-performing schools for support and improvement based on schoolwide performance and the performance of demographic groups within schools. To do so, states employ a variety of measures that can be useful both for accountability and diagnostic purposes. But random variation plays a larger role in measurements from smaller student groups, reducing the measures’ reliability. These measurements often bounce around randomly, increasing the chance that a school might be identified for support and improvement because of bad luck rather than truly low performance.
Group size cutoffs promote reliability at the cost of equity.
States try to reduce the influence of random variation by setting a minimum group size (between 10 and 30 students), below which a group of students is not measured for accountability. The approach removes the most unreliable measurements by making some groups of students invisible—but then resources may not go to students who need them most. The statistical compromise reduces equity without truly solving the problem of reliability.
Bayesian stabilization offers a robust method to improve reliability and help states direct resources more confidently.
Recent work by the Regional Educational Laboratory (REL) Mid-Atlantic demonstrated how Bayesian stabilization can improve statistical reliability for reading and math proficiency measures while including smaller groups of students. In a newly released study, conducted using data provided by the New Jersey Department of Education (NJDOE), we found that stabilization improved the reliability of student growth measures and proficiency measures. Although proficiency measures capture important student outcomes, student growth measures are a powerful tool for understanding how schools affect student outcomes. Improving statistical reliability for both types of measures can contribute to a more valid, reliable, and robust system for both accountability and diagnosis.
To see how stabilizing both growth and proficiency measures may affect accountability designations, we examined changes in which schools would be identified for comprehensive support and improvement (CSI) using stabilized data from NJDOE. We found that stabilizing both growth and proficiency measures would change the relative rankings of schools, moving some schools off the list of those identified for CSI and allowing others to be identified for CSI. Changes like these can make significant differences in how states allocate resources for schools in need of support.
By stabilizing test-based measures, states can make measures of school performance more reliable and inclusive, and they can be confident that their resources are reaching the students and schools that need them most.
A new tool will make stabilization available at no cost to every education agency.
In early 2025, REL Mid-Atlantic will release a free tool, hosted on the U.S. Department of Education’s website, that will make stabilization available to state and local education agencies across the country. The Accuracy for Equity (A4E) tool will provide templates and guidance for data preprocessing; walk users through the stabilization process; and return clear, easily understandable summaries of results. Moreover, the tool requires no student-level data—instead, A4E relies on the kind of school-level and group-level data that states already report publicly and that pose no risk to individual privacy. We encourage you to learn how stabilization may support your accountability and diagnostic processes by improving measure reliability, especially for small groups of students.
To learn more, check out these resources from the RELs:
Stabilizing School Performance Measures for Accuracy and Equity
Outcomes, Impacts, and Processes: A Framework for Understanding School Performance