Measuring Teachers' Effectiveness: A Report from Phase 3 of Pennsylvania's Pilot of the Framework for Teaching
Pennsylvania Teacher and Principal Evaluation Pilot
Prepared for:
Team Pennsylvania Foundation
Bill & Melinda Gates Foundation
Key Findings:
- Teacher performance, as captured by the Framework for Teaching (FFT), was generally rated in the top two possible performance categories (distinguished or proficient) in 2012–2013 and 2011–2012 in the Pennsylvania districts covered in our study.
- Less than 0.1 percent of the teachers in our study were rated in the bottom category (failing).
- The FFT scores were internally consistent, meaning that the domains and the components within each domain appear to be measuring similar concepts.
- The correlations of the FFT scores with value-added measures scores were all positive and generally statistically significant, ranging from 0.19 to 0.22 by domain.
In this report we analyzed data on two measures of teacher performance—one (the Framework for Teaching, or FFT) is based largely on classroom observations, and the other (value-added measures, or VAM) is based on student test scores. The data we analyzed cover 6,676 teachers from 269 districts in the state of Pennsylvania, including Pittsburgh public schools. The observation-based data describe teacher performance on the 22 components of the FFT of Charlotte Danielson. Each of these components is designed to capture a separate teaching practice. We used these data to estimate four domain scores and one overall Professional Practice Rating (PPR) score. We merged these scores with data on teachers’ estimated contributions to student achievement growth. Based on these pilot data from the 2012–2013 school year, we estimate that, although less than 13 percent of teachers received the top rating (distinguished) for the overall PPR score, almost 85 percent were rated in the second highest category (proficient). Less than 0.1 percent were rated in the bottom category (failing). The remaining teachers (around 2.6 percent) received needs improvement ratings. FFT scores were internally consistent, meaning that the domains and the components within each domain appear to be measuring similar concepts. Teachers with higher FFT scores tended to produce greater student achievement growth. The correlations of the FFT scores with VAM scores were all positive and generally statistically significant, ranging from 0.19 to 0.22 by domain. We compared the results based on the 2012–2013 data with results based on 2011–2012 data from a previous pilot phase. For the most part, the findings were similar. More than 90 percent of teachers were rated in the top two performance categories in both phases, although the fraction of ratings in the top two categories decreased somewhat in Pittsburgh (which contributed more teachers to the pilot than any other district). The levels of internal consistency were in the acceptable to good ranges in both phases, with the overall PPR score having higher consistency than any of the domain scores in both phases. The correlations between parts of the FFT and VAM scores were almost always positive but also below 0.30 in both phases. The lowest correlations in 2011–2012 improved slightly in 2012–2013. In sum, although FFT scores are overwhelmingly concentrated in the top two performance categories, the positive correlations with VAM suggest that the FFT provides some meaningful differentiation and captures aspects of teacher skills related to student achievement growth.
How do you apply evidence?
Take our quick four-question survey to help us curate evidence and insights that serve you.
Take our survey