Blog

Experimenting with Multiple Measures of Teacher Effectiveness

Apr 24, 2017

Stephen Lipscomb and Annie Li

Human Services Education

How do we identify great teachers? And how do we help all teachers improve their craft? Some people say that you should have an objective tool based on test scores and focused on output, such as student growth measures. Others claim that observing teachers in the classroom is the way to go. And some feel that student surveys are a valuable source of feedback, because the information comes from those who spend the most time observing teachers in action.

When we take a step back and look at the research, it turns out that the best option might be a combination of all three measures. In 2009, Pittsburgh Public Schools began an ambitious effort to overhaul educator evaluations. The district studied and tested all three measures of teacher effectiveness for the next several years in partnership with researchers at Mathematica’s Educator Impact Laboratory. Our report showed that these measures capture complementary teaching skills, and each measure has the potential to identify meaningful differences between teachers. Combining multiple measures provides a broader view of teacher performance that reflects not only how teachers are doing, but also what they are doing in their practices.

What we learned is important because, despite the recent policy shift from teacher evaluation to school accountability, a good school still needs good teachers. Our review of 21 studies confirmed the big impact that teachers have on student learning, but found that some teachers move the needle more than others.

Guiding Principles

Pittsburgh has not been alone in experimenting with new measures of teacher effectiveness. School districts and states across the country launched similar initiatives in the past decade, including those in Charleston, Chicago, DC, Oklahoma, and Pennsylvania. Ed Impact Lab researchers have partnered with stakeholders to develop those measures and study the results. In doing so, we’ve found that two principles generally guide these initiatives:

Evaluations should incorporate information from multiple measures.
Evaluations should provide useful feedback on teaching practices.

The districts and states we’ve worked with have combined measures of student achievement growth with measures of teacher practices. And the results have been noticed. The Pennsylvania Department of Education looked at what Pittsburgh was doing and embarked on its own multiyear pilot of a new teacher evaluation system that includes both of these types of measures. State legislation in 2012 later made the new system mandatory statewide. We studied the results from the pilot for its most heavily weighted component, a classroom observation measure called the Framework for Teaching. School principals use the Framework for Teaching to evaluate teachers on 22 professional practices using four performance categories — distinguished, proficient, needs improvement, and failing. Our report on the pilot found that teachers’ classroom observation ratings and contributions to student achievement growth were positively related, consistent with several past studies (for example, the Measures of Effective Teaching project’s report). This was true for each of the 22 practices in Pennsylvania. As schools move forward with the new system, we stress the importance of ongoing training for principals in using the Framework for Teaching for high-stakes purposes. Observing teachers multiple times and with more than one rater will also make the results more accurate and reliable.

Rethinking the Binary Score

In the past, teacher evaluations in Pennsylvania had been criticized for rating teachers as either satisfactory or unsatisfactory with nothing in between, and hardly anyone was rated unsatisfactory. There’s evidence that the low rate of unsatisfactory ratings hasn’t changed, but it’s important to remember that these ratings carry the threat of dismissal, so it’s reasonable that they are used sparingly. On the other hand, schools in Pennsylvania can now look at differences among the vast majority of satisfactory teachers. Do they need improvement? Is their performance proficient or even distinguished?

Schools can also provide more practical feedback and support to teachers. Better information on teachers’ strengths and weaknesses can improve professional development and, ultimately, student learning. We know that the capacity to improve teaching likely exists in each school. Research studies, including our report in Pennsylvania, find more variation in teacher effectiveness within schools than across them. And family income does not appear to predetermine the quality of teachers that students receive, according to our report on 26 school districts across the country.

The Need for Evidence

Quote from blog States are submitting accountability plans as required by the Every Student Succeeds Act, which gives more decision making power to the states. For example, they no longer must use a student growth measure such as teacher value-added, which the Obama Administration’s state-waiver system had required. (Read the Ed Impact blog post on value-added to learn about some of its controversies.) What states do with this flexibility is uncertain, but one thing is sure—we need more evidence on evaluation measures.

The new systems that states and districts are implementing aren’t perfect. For instance, Ed Impact Lab affiliate Matthew Steinberg recently wrote a journal article showing that the incoming achievement of a teacher’s students can influence the classroom observation rating their principal assigns them using the Framework for Teaching. In addition, the new system in Pennsylvania and in other places includes student learning objectives, a popular alternative student growth measure for evaluating teachers, particularly in nontested grades and subjects. Teachers and principals establish student learning objectives at the beginning of the school year and assess progress toward them at the end. In a recent review of the literature, we found limited available evidence on the validity of student learning objectives and no evidence on their reliability. More evidence is necessary to know whether they are good indicators of teachers’ true impact on student achievement and of other teacher performance measures. Before that happens, school districts and states should be cautious about using student learning objectives for high-stakes purposes such as teacher evaluations.

About the Authors

Stephen Lipscomb

Principal Researcher

View More by this Author

Annie Li

Researcher

View More by this Author

Federal

State and Local

Commercial Health

Foundations

International Aid Organizations

Experimenting with Multiple Measures of Teacher Effectiveness

Guiding Principles

Rethinking the Binary Score

The Need for Evidence

About the Authors

Stephen Lipscomb

Annie Li

Health

Human Services

Global

Experimenting with Multiple Measures of Teacher Effectiveness

Guiding Principles

Rethinking the Binary Score

The Need for Evidence

About the Authors

More like this from Mathematica