The Every Student Succeeds Act of 2015 requires states to use a variety of indicators, including standardized tests and attendance records, to designate schools for support and improvement based on schoolwide performance and the performance of groups of students within schools. Schoolwide and group-level performance indicators are also diagnostically relevant for district-level and school-level decisionmaking outside the formal accountability context. Like all measurements, performance
indicators are subject to measurement error, with some having more random error than others. Measurement error can have an outsized effect for smaller groups of students, rendering their measured performance unreliable, which can lead to misidentification of groups with the greatest needs. Many states address the reliability problem by excluding from accountability student groups smaller than an established threshold, but this approach sacrifices equity, which requires counting students in all relevant groups.
With the aim of improving reliability, particularly for small groups of students, this study applied a stabilization model called Bayesian hierarchical modeling to group-level data (with groups assigned according to demographic designations) within schools in New Jersey. Stabilization substantially improved the reliability of test-based indicators, including proficiency rates and median student growth percentiles. The stabilization model used in this study was less effective for non-test-based indictors, such as chronic absenteeism and graduation rate, for several reasons related to their statistical properties. When stabilization is applied to the indicators best suited for it (such as proficiency and growth), it leads to substantial changes in the lists of schools designated for support and improvement. These results indicate that, applied correctly, stabilization can increase the reliability of performance indicators for processes using these indicators, simultaneously improving accuracy and equity.
Online Availability