Why Conventional School “Reforms” Have Failed
Teacher and School Evaluations Are Based on Test Scores

by Bill Honig

The reform movement has failed to produce results overall, and reputable evaluations have shown that individual reform measures also proved to be ineffective. Turnaround schools, charter schools, merit pay, or test-based school and teacher accountability have had either nonexistent or trivial effects. In his book Visible Learning, John Hattie writes that even when reforms produced small gains, they fall far below the improvements brought about by validated initiatives. In this article, I examine the failure of one of the major initiatives of the reform movement: high-stakes teacher and school evaluations based on student test scores.

Firing Teachers Based on Students’ Test Scores Is Not the Answer

A major problem with the “reform” strategy is its tremendous overemphasis on removing incompetent teachers based on students’ test performance and enshrining mass firings as a key objective in school improvement efforts. For those who are seeking a “simple” way to improve educational outcomes, this approach has broad superficial appeal. Up until the repeal of No Child Left Behind (NCLB) and the passage of the Every Student Succeeds Act (ESSA) in late 2015, test-based accountability of teachers was a key component of the Obama administration’s educational policy and the price for relief—in the form of waivers—from arbitrary federal requirements. ESSA eliminates a national teacher evaluation system based on standardized tests scores and the federal government’s ability to grant waivers.

Incompetent teachers should be let go if, and only if, credible and fair methods are used. But personnel changes must be part of a broader push for instructional improvements and efforts to raise the performance of all staff—measures that produce much higher effects on student achievement. For examples of these more positive measures, see the Aspen Institute March 2016 report Evaluation and Support Systems: A Roadmap for Improvement.

Up until several years ago, the reform agenda had primarily relied on test-driven, high-stakes accountability systems to punish or reward schools—a questionable enough approach, as discussed later in this article. A recent shift compounded the error when districts and states began to use the tests to also evaluate teachers and administrators and mete out punishment, termination, or rewards.

Often accompanied by hostile, anti-teacher rhetoric, teacher evaluation systems based on test scores became a central plank in the reform movement. That is why some “reformers” have virulently campaigned against teachers’ unions and due process (tenure) rights for teachers. They see these protections (which should be streamlined when they become too cumbersome) preventing the unfettered ability to eliminate incompetent teachers and frustrating what in their minds is the most viable strategy to improve schools—firing bad teachers and pressuring the rest to improve.

As an aside, the article “Tenure: How Due Process Protects Teachers and Students” explains tenure in the context of due process rights and provides a cogent rationale for fair process protections during dismissal proceedings. Also, see Dana Goldstein’s excellent book The Teacher Wars: A History of America’s Most Embattled Profession, which provides a gut wrenching picture of historical harm and arbitrary treatment teachers received before these due process rights were secured.

Many reformers as well as their political and media supporters frame the current debate about educational direction as a clash between themselves—the only ones who are trying to improve schools—and lazy or incompetent teachers and their unions. They contend that those who attempt to block their reform efforts are just trying to protect teacher prerogatives. This is why many policymakers and pundits take a confrontational rather than a cooperative stance. But educators’ opposition to the reform platform is much broader and goes much deeper than this all-too-common specious analysis. It is not that teachers (and their representatives) do not want to improve performance or that they do not see the need for schools to get better at what they are doing. Almost every professional wants that. What teachers and most district and school administrators object to is the path reformers have laid out toward accomplishing that goal. They view reformers’ Test-and-Punish reform initiatives as ill advised and ineffective at best and detrimental at worst. And they are correct.

Teacher Quality: Putting the Issue in Perspective

Contrary to the reform movement’s superficial and overheated rhetoric, the quality of teachers, while significant, is not the only important influence on student performance. According to various research studies, it accounts for only about 10% of student achievement. Bashing and blaming teachers is not a new trick. As recounted in Goldstein’s The Teacher Wars, this destructive ploy has emerged several times in our history, driven by “moral panic.” It is unjust to single out teachers as the primary cause of underperforming students and schools and, at the same time, fail to address more influential factors.

As one example, family and social dysfunction is on the rise and has had a devastating effect on educational performance. This is particularly true among working-class families. Robert Putnam’s important new book, Our Kids: The American Dream in Crisis, reveals the alarming growth in recent decades of social pathology among white working-class families. During the same period, professional families have stayed much more stable. Socioeconomic levels continue to significantly outrank all other influences on student performance.

Of course, it is easier to blame teachers for not reversing the damage done by wage stagnation and the dramatic decrease in blue-collar jobs in this country over the past decades rather than tackle these larger problems directly. In the US, we have seen rising levels of inequality, the increase of single-parent families, a steady climb in drug use, and the dearth of supportive services. Reformers’ penchant for blaming teachers and school administrators for low school performance conveniently absolves other societal institutions and actors of their responsibility for ameliorating injurious socioeconomic trends. Stanford professor Linda Darling-Hammond, one of the most respected commentators on how best to improve schools, offers an alternative view. She has called for “reciprocal accountability,” which requires that all major stakeholders share the responsibility for school performance improvement, not just teachers who are so easily scapegoated.

Julie Rummel provides a poignant much-needed teacher’s perspective of the harsh reality encountered in many of our schools. She moved from a dysfunctional poverty-stricken school where she was labeled a “mediocre” teacher to a more upscale campus where she was regarded as great. She essentially made no changes in how she taught or connected with her students.

In a new infographic, Kevin Welner of the National Center for Education Policy reinforces the unfairness of expecting schools to reverse the deleterious effects of poverty by themselves.

Finally, A Broader, Bolder Approach to Education, an organization devoted to addressing these broader issues, just relaunched its efforts following the enactment of the federal ESSA legislation.

Isabel Sawhill and Edward Rodrigue list three measures that had a major effect on whether an individual would remain in poverty. They found that graduating from high school, being in a family with at least one full-time worker, and being at least 21 and married before having children correlated closely with economic success. They describe this as “the success sequence.” According to these researchers, only 2.4% of those Americans who follow the success sequence will live below the poverty line, while over 70% enjoy at least middle-class incomes, defined as at least 300% of that poverty measure. For those who do not meet the three criteria, 79% will live in poverty. Only one of these measures is directly school related—graduation rates. Having an adult with a full-time job depends on successful job creation efforts.

According to Isabel Sawhill, “If we want to reduce poverty, one of the simplest, fastest and cheapest things we could do would be to make sure that as few people as possible become parents before they actually want to.” Here is an example of what could be done to substantially lower teen pregnancy and thus improve educational performance. From 2009 to mid-2015 a dramatically successful program in Colorado slashed the teen pregnancy and abortion rates by nearly 50% by providing free long-term birth control devices such as IUDs for teenagers. The Susan Thompson Buffett Foundation initially funded the program, but when the grant ended in 2015, the Republican-controlled legislature killed a bill to support this successful effort. Private donations saved the program for a year.

Tests Are Not Reliable Measures of Teacher Performance

Considering in-school issues, the technical ability of current student tests to accurately identify high- and low-performing teachers is woefully inadequate. In the past few years, a compelling body of research has emerged that demonstrates the dangers of test-based teacher evaluations. Three major research institutes—American Education Research Association (AERA), the National Academy of Education, and the American Statistical Association (ASA)—have forcefully warned against employing these measures for teacher evaluations. The AERA issued standards for teacher evaluation measures, which virtually no existing instruments meet.

Currently, Value-Added Measures (VAMs) is a popular tool. It claims to assess growth by aggregating individual scores adjusted for socioeconomic measures. Like other tools in widespread use, it is not accurate enough for evaluating teachers. A seminal critique of the growing use of test scores and value-added measures was written by Linda Darling-Hammond, Audrey Amrein-Beardsley from Arizona State University, Edward Haertel from Stanford, and Jesse Rothstein from University of California, Berkeley. Their research revealed how inexact the measures were, and they present case studies of egregious misidentification—when excellent teachers have been misidentified as low performing and unfairly dismissed. In Teacher and Student Evaluation: Moving Beyond the Failure of School Reform, Alyson Lavigne and Thomas Good provide a comprehensive analysis of the history of teacher evaluations. Their analysis also found the present strategies to be defective. Further, Rick Stigins in his 2014 book, Defensible Teacher Evaluation: Student Growth through Classroom Assessment, reviews the major deficiencies in current high-stakes, test-driven teacher evaluation.

In March 2015, the respected publication Educational Researcher devoted an entire issue to critiques of the most common VAMs, plus some supporting statements with caveats. For a list of the top 15 research articles that discredit the use of test scores and VAM approaches, see Amrein-Beardsley’s VAMboozled website. The site includes a recommended reading list and lists 86 articles that have raised major technical concerns about VAM. For an exhaustive list, more information, and research articles that discredit the use of test scores and VAM approaches, see briefing paper Problems with the Use of Student Test Scores to Evaluate Teachers and the article “Studies Highlight Complexities of Using Value-Added Measures.” Both make a compelling case against test-driven teacher evaluation. Professor Edward Haertel from Stanford has written a particularly persuasive admonition against this practice, as has Leo Casey in his article published in the esteemed, peer-reviewed journal Teachers College Record titled “The Will to Quantify: The “Bottom Line” in the Market Model of Education Reform.”

Two reports published in 2016 underscore the serious limitations of VAM. The first describes how VAM use failed in the Houston Independent School District (HISD); the second, produced by REL at WestEd, discusses how deficient VAM is in predicting teacher quality.

For further reading on the limits of VAM as a measure of teacher quality, see also “A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling” by Stuart S. Yeh, Beardley’s blog post “The (Relentess) Will to Quantify,” and a critique of New York’s plan to use VAM methods to evaluate teachers. Finally, the Melbourne Graduate School of Education website has a compelling video lecture on why test score evaluation does not work.

Evaluations Based on Test Scores Misidentify Teachers

These well-respected researchers make the following points. The section “Standardized Tests Are Not the Best Measures of School or Teacher Quality” in companion article Reformers Target the Wrong Levers of Improvement made the case that such scores fail to accurately measure deeper and broader learning.

As importantly, student tests were never designed to be used for teacher evaluation and suffer from high levels of misidentification, or noise. Studies have shown that if we use current tests, a teacher who is ranked at the 50th percentile could be anywhere from the 85th percentile to the 15th. A significant number of teachers bounce from top to bottom, or vice versa from year to year. A recent report from the US Department of Education found very high misidentifications—even with three years of data per teacher. One-fourth of the teachers identified as “low performers needing remediation” were actually at the mid-range of performance, and one-fourth of teachers who were deemed “average” were actually in need of professional development and support. That level of imprecision should be unacceptable for any respected profession.

A team of renowned researchers set out to demonstrate the absurdity of using student tests to determine teacher effectiveness. They found that changes in the height of students, which is obviously independent of teacher influence, was nearly as predictive of teacher effectiveness as test scores.

Examples of Test-Based Evaluations That Fail Exemplary Teachers

As a result of districts using these suspect measures, there have been numerous cases of top-flight teachers receiving negative scores and of teachers who were identified one year as “stellar” receiving a low rating the next. In some cases this occurred because a teacher voluntarily agreed to take a more difficult class. Then the teacher suffered by comparison with the easier class (s)he taught in the previous years.

The telling case of Pascale Mauclair clearly demonstrates how dangerous it can be to use such dubious evaluation measures. She agreed to teach a harder-to-educate class and did a superb job, but instead of getting congratulated she was identified by the press as “One of the Worst Teachers in the City.” A study conducted by Darling-Hammond and her colleagues also documented tragic cases of misidentification.

Finally, some teachers are taking the issue to court claiming the current evaluation procedures are so arbitrary that they are fatally flawed. An example is Sheri Lederman’s case before the New York courts. Each year she works with students who achieve double the average student proficiency rates in the state, but since her students scored so high in previous years, she did not meet the state’s ill-conceived standard requiring growth from year to year. As a result, Lederman received a low rating. The trial court held the existing system was “arbitrary and capricious” as it applied to her, and voided the rankings based on test scores.

Consequently, when such growth measures are used, results can be extremely arbitrary at the upper ends. My favorite example is the ludicrous case of Carolyn Abbott, an exemplary teacher of gifted students in New York City. Her students made huge gains each year, invariably scoring at the highest levels. In one year, her previous gifted class scored at the 98th percentile and, based on that high performance, her predicted next year’s score became the 97th percentile. The actual score landed at the 89th percentile. Many of her gifted students saw no reason to try hard on the state test since they were doing much more advanced work. On other tests that had consequences for students, they scored in the highest ranks. Even though Abbott was doing an exemplary job, the newspapers dubbed her the “Worst 8th Grade Teacher in the City.” The real story was the complete opposite.

I have some important feedback for the news media in this country: Shame on you for rushing to publish teacher rankings when you know, or should know, that these lists are bogus and prone to error. Even the more thoughtful advocates of VAMs caution against their use for high-stakes personnel decisions.

Other Flaws in Test-Based Evaluation Systems

As previously mentioned, test scores and even evaluations by principals tend to track the socioeconomic status of the student population. So schools in low-income areas have a significantly higher number of low evaluations and fewer high evaluations—a clearly unjust situation and a surefire detriment to attracting our best teachers to these areas in need.

In addition, no one yet has solved the problem that most teachers are not math and reading teachers. Thus, they do not teach the math and reading content tested on the new PARCC and SBAC assessments, yet they are still held accountable based on those test scores. As a result, many teachers are now suing after receiving low evaluations based on the test performance of students they never taught. Another major defect in the measures being used for evaluation is that different assessment instruments yield widely dissimilar results. This is further confirmation of the inaccuracy and inadequacy of these measures.

Further, individual evaluations do not take into account school context, which has a large influence on teacher performance. The students of two similarly talented teachers will score differently if one group of students is in a school led by an effective principal with working teams, a good school climate, and active engagement of students and parents while the other is in a dysfunctional school with none of these attributes.

Finally, yearly decisions about which students get assigned to which teachers can tremendously skew a teacher’s evaluation. Nonrandom assignment of students vitiates a key requirement of valid teacher evaluation systems, subjecting teachers to potential principal favoritism and pressure from parents. Newer VAMs that are being used more and more were supposedly designed to correct this, but apparently they have not. Conversely, one of the best predictors of student achievement is whether teachers are assigned to teach classes in their areas of expertise or classes that match their skill set. Ironically, when districts use the information from test-based evaluations as a proxy for mis-assignment and then reassign teachers to subjects aligned with their preparation and experience, students enjoy a much greater boost to performance than performance improvements resulting from firings using value-added teacher accountability. Perhaps the best use of high-stakes testing is holding administrators accountable for proper assignment of teachers, instead of serving as the unsound basis for teacher evaluation.

Test-Based Evaluations Do Not Measure Good Teaching and Harm the Profession

Do evaluations based in large part on math and reading test scores actually measure “good teaching”? A 2014 report by well-regarded researchers Martin Polikoff and Andrew Porter says no. They looked at six districts nationwide and found that measures of students’ opportunity to learn (OTL) the content specified in standards and measures of instructional quality, both of which have been found to be highly predictive of student learning, showed weak or zero correlation with the VAM tests being used to evaluate teachers.

The upshot of all this research is that not only is test-based teacher evaluation unfair to the limited number of teachers who can benefit from professional support, but the arbitrary threat issued to all teachers impairs their performance and discourages them from remaining in the profession. The Test-and-Punish approach has also had a damaging effect on efforts to recruit new talent. Defective evaluation schemes have many negative consequences, including teachers avoiding hard-to-teach children and resisting collaborative team-building efforts. The demeaning rhetoric about widespread teacher incompetence is another key factor contributing to growing teacher demoralization. For more on this topic, see the companion article Reformers Allowed Their Rhetoric to Be Hijacked.

The op-ed piece “Standardized Tests Don’t Help Us Evaluate Teachers” is an in-the-trenches summary by a Los Angeles Unified attorney who helped create teacher evaluations and now finds them defective. In The Teacher Wars, Dana Goldstein offers an excellent account of the disastrous consequences of personnel decisions tied to student test scores. For an excellent summary of why test scores should not be used for consequential evaluations, see David Berliner’s piece.

Finally, a 2016 extensive report on teacher evaluation policies by Thomas Toch recommends using evaluative information primarily for program and teacher improvement. This shifts the major purpose of evaluations from rooting out the lowest performers to policies aimed at lifting the whole staff.

The business community has been moving away from ranking schemes for decades, recognizing that such evaluations are superficial, work against team building, cause lower performance, and discourage risk taking by employees. In a clear case of hypocrisy, many business leaders have no compunction about recommending such discarded measures for schools.

Fortunately, many school districts and states have been withdrawing from or minimizing the use of mandatory test-based teacher evaluations leading to dismissal proceedings. Many others are using teacher evaluations as just one of many data sets that provide useful information and feedback to teachers and faculties about where to concentrate improvement efforts. Three examples are Tulsa, the state of Michigan, and Houston. State Action to Advance Teacher Evaluation, a comprehensive report by the Southern Regional Education Board, and the California blueprint, Greatness by Design, advocate using evaluations to feedback useful information for teacher improvement.

One of the most prominent architects of teacher evaluation, Charlotte Danielson, whose rubrics are in widespread use, has castigated the present way evaluations are being conducted and used. Even New York governor Andrew Cuomo, who was a strong advocate for test-based high-stakes teacher evaluation, has backtracked, and the New York Regents have halted required state test-based teacher evaluations for four years. Many educational leaders such as New York’s Nassau County superintendents had warned against this practice. Some political leaders , such as Hillary Clinton, are also beginning to speak out about the dangers of test-based teacher evaluation. Finally, a court in New Mexico found that using VAM scores based on tests is too imprecise to be used to attach consequences to the results.

Bill Gates, one of the strongest proponents of teacher evaluation strategies, has issued warnings about their overuse:

Too many school systems are using teacher evaluations as merely a tool for personnel decisions, not helping teachers get better. . . . Many systems today are about hiring and firing, not a tool for learning.

In response, to this growing resistance to test-based teacher evaluation, the recent reauthorization of the federal Elementary and Secondary Education Act (ESEA), now named the Every Student Succeeds Act (ESSA), ignores test-based teacher evaluation.

A More Effective Approach to Teacher Evaluation

Preliminarily, advocates of high-stakes teacher evaluation have a misguided view of “teacher quality.” They think it is a static individual attribute that—after the first few years—can’t really change. A more sophisticated viewpoint sees teacher quality as dynamic, which does and should grow over time. Esther Quintero, a management expert, supports this point of view. Writing for the Albert Shanker Institute blog, Quintero explains:

In the US, a number of unstated but common assumptions about “teacher quality” suffuse the entire school improvement conversation. As researchers have noted . . . instructional effectiveness is implicitly viewed as an attribute of individuals, a quality that exists in a sort of vacuum (or independent of the context of teachers’ work), and which, as a result, teachers can carry with them, across and between schools. Effectiveness also is often perceived as fairly stable: teachers learn their craft within the first few years in the classroom and then plateau, but, at the end of the day, some teachers have what it takes and others just don’t. So, the general assumption is that a “good teacher” will be effective under any conditions, and the quality of a given school is determined by how many individual “good teachers” it has acquired.

In British Columbia, Hong Kong, Shanghai and Singapore, none of these assumptions seems to be at work. Teacher effectiveness is not something fixed that individual teachers do or don’t possess. Rather, effectiveness is both a quality and an aspiration of schools: Schools ought to be organized and resourced so that teachers continuously and collaboratively improve. In these high performance systems, the whole (school effectiveness) is greater than the sum of its parts (individual teacher effectiveness) because, as Susan Moore Johnson argues:

Whatever level of human capital schools acquire through hiring can subsequently be developed through activities such as grade-level or subject-based teams of teachers, faculty committees, professional development, coaching, evaluation, and informal interactions. As teachers join together to solve problems and learn from one another, the school’s instructional capacity becomes greater than the sum of its parts.

The Learning Policy Institute published a report by Kini and Podolsky, Does Teaching Experience Increase Teacher Effectiveness? A Review of the Research, which debunked the idea that teachers don’t continue to become more effective after the first three-year learning spurt. Obviously, well-constructed professional learning will enhance the normal growth process.

It is important to dismiss incompetent teachers if their dismissal is done fairly and is part of an overall effort that gives teachers the support and time they need to improve before they are dismissed. With the right resources and approach, many low-performing teachers become good teachers. Our most successful districts do not ignore struggling teachers. They use effective assessments that include feedback, peer participation and review, and support. They have organized schools to be learning institutions in which all staff can continuously improve. These districts are also careful when making initial hiring decisions and granting tenure. Ironically, districts that follow these more supportive evaluation strategies often end up with higher dismissal rates than those following the pure Test-and-Punish approach. See the policy brief Evaluation, Accountability, and Professional Development in an Opportunity Culture, which outlines proven, more positive approaches to teacher evaluation.

Lavigne and Good have surveyed the best research and practices in the field. In their 2015 book, Improving Teachers Through Observation and Feedback, they offer powerful suggestions for correctly conducting evaluation in the service of improved performance. Their proposals markedly differ from what most districts are currently doing. In fact, Lavigne and Good emphasize useful feedback and cooperative effort, as opposed to formal evaluations. Information from a teacher’s student tests can help that teacher improve instruction when valid methods, measures, and strategies are employed, and checked for accuracy. Again, student test data should not be used in personnel decisions but as part of a broad-scale effort to collect evidence that will help teachers and schools improve. The current emphasis on narrowly conceived test-based, high-stakes teacher evaluation is unfair and ineffective.

A major report by the Network for Public Education, Teachers Talk Back: Educators on the Impact of Teacher Evaluation, reinforces the view that test-based teacher evaluation is harmful and evaluations should instead focus on improving instruction as some states such as California have done.

Does Dismissing Incompetent Teachers Improve Student Outcomes?

This is the most crucial question for those who support Test-and-Punish. First, after a decade of intensive effort to pursue teacher evaluation schemes, the results have been negligible. Rick Hess reports on a study conducted by Matt Kraft and Allison Gilmour. According to Hess:

[The authors] look at teacher evaluation results in 19 states that have adopted new evaluation systems since 2009. Unfortunately, all that time, money, and passion haven’t delivered much. Kraft and Gilmour note that, after all is said and done, the share of teachers identified as effective in those 19 states inched down from more than 99% to a little over 97% in 2015.

Second, the fact is that fixating on just the three to five percent sliver of teachers who are not performing, even if the evaluation process were fair and accurate, affects only a small fraction of teachers with limited payoff. In a school of 20 teachers, eliminating one incompetent teacher will help one class of students but does nothing for the other 19 classes. However, making test-based evaluations and dismissals a major policy component drags 19 other teachers into the vortex of legally justified yet burdensome and what are often superficial evaluation schemes.

When compared to schoolwide initiatives aimed at improving the entire staff and unleashing their potential as a coordinated team, the effect sizes of firing a failing teacher on overall student performance are small. Contrary to recent reform rhetoric, even if three to five percent of incompetent teachers were dismissed tomorrow, student gains would be minimal. There are much more productive strategies to improve student performance.

Recently, a media frenzy erupted over a research report that claimed a huge benefit from firing the worst teachers. This report sensationalized the effect of replacing a poor teacher with an average teacher by stating that the lifetime earning benefits of a given class would increase by $266,000. Diane Ravitch has questioned the methodology used in the research report. Even if the research were valid and findings accurate, the boost in earnings is quite trivial. As the report itself states, the figure amounts to about a discounted $7,000 per student per lifetime, or less than $200 per year. Put another way, the reported effect sizes are tiny compared to the payoff from other improvement strategies. Finally, the report admits that correlations are low at 0.5, which means that large numbers of teachers are identified as lacking who aren’t, and similar numbers are identified as proficient who are actually struggling.

Measures of Effective Teaching (MET) was a major study sponsored by the Gates Foundation. It found that the measures of teacher effectiveness did predict student performance in mathematics, although, again, effect sizes were small. Significant technical issues were raised about the methodology of this study as well. Critics have asked: Were random assignments fully carried out? Did the teachers of hard-to-educate students participate in sufficient numbers to validate the results? Were all the data reported? Is the report based on the flawed assumption that test scores, principal evaluations, and student surveys predicted the same thing?

However, the most damning objection to using the MET report to support high-stakes testing for personnel decisions comes from the report itself. It cautions against such use, saying the researchers did not determine or even consider if evaluation for high-stakes personnel decisions might well negate their findings. The report makes the conjecture that teaching to the test, narrowing curriculum, gaming the system, and failing to cooperate with other teachers competing for bonuses could very well lower student performance. Finally, and critically, this report does not present any evidence that identifying who is a high- or low-quality teacher resulted in improving instruction.

As demonstrated by the extensive research cited above, there is thin to nonexistent evidence suggesting that a reform strategy focused on firing incompetent teachers produces any significant gains in student achievement. Further, policymakers’ misplaced emphasis on the few suspected lowest performers comes with a huge cost. Frequently, all teachers regardless of their demonstrated capabilities are evaluated by expensive, hugely complicated, and time-consuming procedures. These evaluations gravitate toward a checklist mentality of individual items, which trivializes teaching instead of seeing it through a more complex and accurate lens. In addition to the previously mentioned Teacher and Student Evaluation: Moving Beyond the Failure of School Reform (Lavigne and Good, 2014) and The Teacher Wars (Goldstein, 2014), a video produced by WestEd provides an excellent summary of the best research and principles of effective professional evaluation systems.

Can Evaluations by Principals Fix the Problems of Test-Based Accountability?

Relying on principals’ classroom observations cannot obviate the deficiencies of using test scores to evaluate teachers. Evaluations of teachers by principals are heavily influenced by the socioeconomic levels of their students. According to Alisha Kirby:

As the components of teacher evaluations remain under debate among policymakers, a new study suggests the results of classroom observation may hinge more on the students’ capabilities than the teacher’s.

Analysis from the American Institutes for Research and the University of Pennsylvania’s Graduate School of Education found that students’ behavior and prior academic achievement weighs heavily on teacher performance and can skew the results of an evaluation.

“When information about teacher performance does not reflect a teacher’s practice, but rather the students to whom the teacher is assigned, such systems are at risk of misidentifying and mislabeling teacher performance,” reported Rachel Garrett of the American Institutes for Research and Matthew Steinberg from the University of Pennsylvania’s Graduate School of Education.

Two papers reached the same conclusions. One paper is Leading via Teacher Evaluation: The Case of the Missing Clothes? The other one is Educational Evaluation and Policy Analysis.

Further, most principals are not adequately prepared to conduct accurate teacher evaluations. Many now find themselves spending an inordinate amount of time conducting formal classroom observations with extensive item checklists in hand. They are visiting each classroom several times a year rather than spending the time needed for schoolwide efforts that will improve curriculum and instruction. It is a case of evaluation run amok. Lavigne and Good provide a chilling example of this pathology. Under Tennessee’s byzantine and excessive teacher evaluation system, principals must visit each teacher’s classroom four to six times a year. In a school of 20 teachers, that means spending between 176 and 260 hours per year on observation, not assistance. Some research even suggests that classroom observations for purposes of evaluation actually reduce performance.

A pilot report from Chicago found small effects when principals used an evaluation strategy that included two observations of reading teachers per year. The results of the evaluations were used for teacher and school improvement, not harsh consequences. A key finding was that extensive training of principals in observation techniques and how to use the evaluations in program improvement made a large difference. Finally, many walkthroughs by principals miss the essence of good teaching and instead concentrate on trivia, according to Peter DeWitt.

A Narrow Focus on Dismissing Teachers Detracts from Effective Improvement Measures

Crucially, such a narrow policy focus on dismissing a few teachers often leads to a failure to address other vital in-school measures, which significantly influence the performance of all teachers and the achievement of students. For example, large numbers of teachers leave inner-city schools each year. Teacher churn and the resulting heavy use of substitutes are a major reason for low student performance. Excellent teachers are leaving the profession due to the stress of teaching in low-income urban schools and dreadful working conditions. This problem overshadows the damage done by a few underperforming teachers.

Several researchers have recommended policies aimed at encouraging the retention of our best teachers. The New Teacher Project (TNTP) published a report in 2013 entitled The Irreplaceables: Understanding the Real Retention Crisis in America’s Urban Schools. The report laments that most districts do not have policies to encourage the highest-performing 20% of teachers to stay, and as a result the districts suffer high attrition rates. Top teachers want collegiality, being part of effective teams, better working conditions, somebody paying attention to them, and career paths that allow them to keep teaching but take on additional responsibility helping other teachers or solving school performance problems and earn more money. Districts that solely concentrate on firing incompetent teachers miss this much larger and more productive target.

It is also important to recognize that the quality of the curriculum and instructional materials is just about as important as teacher quality. For more about the importance of curriculum and educational resources, see the companion article Provide High-Quality Instruction.

In addition, the level of school funding matters. Yes, money does make a difference. Recent reports by moderate and conservative institutions refute reformers’ often expressed claim that expenditure levels are not a key component of quality. The reports find that increased funding results in improved student performance, and conversely, cutting school budgets depresses outcomes. Similar results were found in Indiana after the state drastically cut educational support. The companion article Provide Adequate School Funding covers the role of funding in its discussion of district/state support for improving schools.

For a review of the literature that has revealed funding matters, see Does Money Matter in Education? Unfortunately, the “money doesn’t matter philosophy” and political antipathy to public education in this country have substantially hampered school funding. Most states are spending below their 2008 expenditures, and some are cutting even more.

Equally important is site and district leadership, particularly as they relate to building systems that connect teaching, curriculum, and instruction; to continuously improving these elements; and to improving the school climate by increasing the degree of engagement of teachers, students, and parents and community. A recent report by Thomas Kane from Harvard found teacher perception of the school being a good place to work improved performance. In math, the amount of professional development and teacher feedback also helped. Principal leadership accounts for about one-quarter of in-school measures of student performance, teacher quality about one-third.

For a perceptive two-part series on how to best train principals to lead and a description of efforts currently under way in four states, see the Marc Tucker’s blog posts “Organizations in Which Teachers Can Do Their Best Work,” Part 1 and Part 2. For a comprehensive report on principal training, see The School Principal as Leader: Guiding Schools to Better Teaching and Learning and the standards for school leadership approved by the National Policy Board for Educational Administration in 2015.

There are other essential components of effective improvement efforts: provision of social support and medical services, ongoing professional development and team building for all teachers, and the use of just-in-time assessment systems and valid data on each student’s progress to inform instruction.

Reform measures that emphasize terminating incompetent teachers based on questionable methods not only lower teachers’ morale and efficacy, but inevitably lead to conflict with staff who understand the underlying flaws in the strategy. The evidence is clear—conflict between key stakeholders tends to sabotage the cooperative efforts needed to achieve effective reform. As many have said, “You can’t fire your way to educational greatness.”

Targeting the Lowest-Performing Schools with Closure and Other Drastic Measures Is Usually Ineffective

When it comes to evaluating schools, high-stakes accountability based on tests has been just as ineffective and just as problematic in terms of unintended consequences. Concentrating on five percent of low-testing schools and responding to their performance with drastic measures—closures, mass firings, or conversion to charters—has produced negligible results. Such reform measures do, however, severely impact those schools, their students, and the surrounding communities. This is even more concerning given that many of the affected schools were unfairly misidentified. They were actually progressing equal to or better than the remaining schools in the district. The failure of school turnaround policies has been documented by a number of respected sources. According to the National Education Policy Center’s description of a meta-analysis by Tina Trujillo of University of California, Berkeley and Michelle Renée of Brown’s Annenberg Institute for School Reform, Trujillo and Renée stated that school turnaround policies are “more likely to cause upheaval than to help.” See also pages 96–97 of the previously cited Teacher and Student Evaluation: Moving Beyond the Failure of School Reform, and for an overall study of turnaround strategies, see Emerging State Turnaround Strategies, a report prepared by the Education Commission of the States.

States that used tests to grade schools have found major problems with accuracy, and many have reversed the policy. For a critique of Florida’s 15-year failed effort to get school grades right, see “School Scandals Reveal Problem with Grading Schools.” For a broader, balanced critique of Florida’s reform initiatives, see the Shanker Institute’s policy brief. Many have questioned whether the state reform formula and direction were actually the driving force behind the early gains. Instead, they point to the efforts made by excellent local superintendents who stressed the Build-and-Support approach. Florida’s gains have since stalled following school-funding cutbacks, massive charter expansion, and stringent accountability measures. There are reports showing that segregation and in-school deficiencies considerably outweigh school-to-school comparisons in predicting achievement gaps.

This research demonstrates that, as of yet, the knowledge base for identifying failing schools is not sufficiently developed to allow for fair assessments. As a result, many local sites are labeled as failures simply because they have large numbers of poor and/or students of color. In addition, there is no clear research-based consensus regarding the best ways to intervene in low-performing schools. For example, recent evaluations of the federal School Improvement Grants program aimed at the lowest-underperforming schools found a slight overall improvement, but one-third of the grantees actually had falling scores. The feds are currently providing a bit more flexibility to applicants under the program, admitting that their previous prescriptions were off base. Moreover, even if reform efforts were fair and successful, focusing on the few schools at the bottom ignores the vast majority of children. As Michael Fullan, one of the most respected leaders of the Build-and-Support approach, has pointed out, policies aimed at improving all schools have far better results. Edward Fiske and Helen Ladd made a similar point in an op-ed about successful low-income districts in London. The districts that flourished pursued a districtwide strategic improvement plan as opposed to targeting the lowest performers, used broad accountability systems that went beyond test scores, and provided support for low-income students.

Recent Developments

9/14/2016 Where school turnarounds have been successful they have been embedded in over-all district efforts to improve and have avoided a punishment approach. A new report by the Center for American Progress https://www.americanprogress.org/issues/education/report/2016/09/13/143922/7-tenets-for-sustainable-school-turnaround/ has found seven important issues for successful school turnarounds:

Grant districts, and ultimately the state, the authority to intervene in failing schools.

Provide significant resources to support planning and restructuring and leverage competitive grants.

Treat the district as the unit of change and hold them accountable for school improvement.

Create transparent tiers of intervention and support combined with ongoing capacity building and sharing best practices.

Promote stakeholder engagement.

Create pipeline programs for developing and supporting effective turnaround school leaders.

Embed evaluation and evidence-based building activities in school implementation

7/30/2016 Audrey Amerain-Beardsley reviewed an excellent piece from twenty years ago by Ed Haertel on the deficiencies of test-based evaluations of teachers. http://vamboozled.com/wp-content/uploads/2015/01/Haertel_1986.pdf She details six major points Haertel makes, all consistent with the article above.

7/30/2016 Another researcher debunks the value of value added measures for teacher evaluation. http://vamboozled.com/vams-are-never-accurate-reliable-and-valid/ Another district eliminated VAMS for teacher evaluation. http://vamboozled.com/no-more-evaas-for-houston-school-board-tie-vote-means-non-renewal/

BBS Companion Articles

Why Conventional School “Reforms” Have Failed
Reformers Target the Wrong Levers of Improvement
How Top Performers Build-and-Support
Provide High-Quality Instruction
Provide Adequate School Funding

Reference Notes

Hattie, J. (2008). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. London: Routledge.

Firing Teachers Based on Students’ Test Scores Is Not the Answer
Aspen Institute. (2016, Mar). Evaluation and Support Systems: A Roadmap for Improvement. http://www.aspeninstitute.org/publications/teacher-evaluation-support-systems-roadmap-improvement See also Brown, C., Partelow L., & Konoske-Graf, A. (2016, Mar 16). Educator Evaluation: A Case Study of Massachusetts’ Approach. https://www.americanprogress.org/issues/education/report/2016/03/16/133038/educator-evaluation/ Humphrey, D., Koppich, J., & Tiffany-Morales, J. (2016, Mar). Replacing Teacher Evaluation Systems with Systems of Professional Growth: Lessons from Three California School Districts and Their Teachers’ Unions. https://www.sri.com/work/publications/replacing-teacher-evaluation-systems-systems-professional-growth-lessons-three Taylor Kerchner, C (2016, Mar 21). Five Lessons for Creating Effective Teacher Evaluations. http://blogs.edweek.org/edweek/on_california/2016/03/five_lessons_for_creating_effective_teacher_evaluation.html

Kahlenberg, R. D. (2015, Summer). Tenure: How Due Process Protects Teachers and Students. American Educator. http://www.aft.org/ae/summer2015/kahlenberg

Goldstein, D. (2014). The Teacher Wars: A History of America’s Most Embattled Profession. New York: Doubleday.

Teacher Quality: Putting the Issue in Perspective
Haertel, E. H. (2013). Reliability and Validity of Inferences About Teachers Based on Student Test Scores. Educational Testing Service. http://www.ets.org/research/policy_research_reports/publications/publication/2013/jquq

Putnam, R. D. (2015). Our Kids: The American Dream in Crisis. New York: Simon & Schuster. For other works on the same topic, see Morsy, L., & Rothstein, R. (2015, Jun 10). Five Social Disadvantages That Depress Student Performance: Why Schools Alone Can’t Close Achievement Gaps. Economic Policy Institute. http://www.epi.org/publication/five-social-disadvantages-that-depress-student-performance-why-schools-alone-cant-close-achievement-gaps/?utm_source=Economic+Policy+Institute&utm_campaign=26b9c8a34e-EPI_News_06_12_156_12_2015&utm_medium=email&utm_term=0_e7c5826c50-26b9c8a34e-55876685 Berliner, D. C. (2013). Effects of Inequality and Poverty vs. Teachers and Schooling on America’s Youth. www.tcrecord.org/content.asp?contentid=16889 See also Summers, L. H., & Balls, E. (2015, Jan). Report on the Committee for Inclusive Prosperity. https://www.americanprogress.org/issues/economy/report/2015/01/15/104266/report-of-the-commission-on-inclusive-prosperity/

Rich, M., Cox, A., & Bloch, M. (2016, Apr 29). Money, Race, and Success: How Your School District Compares. The New York Times. http://www.nytimes.com/interactive/2016/04/29/upshot/money-race-and-success-how-your-school-district-compares.html?_r=3

Darling-Hammond, L., Wilhoit, G., & Pittenger, L. (2014, Oct 16). Accountability for College and Career Readiness: Developing a New Paradigm. Stanford Center for Opportunity Policy in Education. https://edpolicy.stanford.edu/publications/pubs/1257

Glass, Gene. V. (2016, Apr 5). Take All the Credit? You’ll Get All the Blame. http://ed2worlds.blogspot.com/2016/04/take-all-credit-youll-get-all-blame.html?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+EducationInTwoWorlds+%28Education+in+Two+Worlds%29

Strauss, V. (2016, Mar). No, Great Schools Can’t Close Achievement Gaps All by Themselves. The Washington Post. https://www.washingtonpost.com/news/answer-sheet/wp/2016/03/21/no-great-schools-cant-close-achievement-gaps-all-by-themselves/

A Broader, Bolder Approach to Education. http://www.boldapproach.org/

Sawhill, I. V., & Rodrigue, E. (2015, Nov 18). An Agenda for Reducing Poverty and Improving Opportunity. Brookings. http://www.brookings.edu/research/papers/2015/11/campaign-2016-presidential-candidates-poverty-and-opportunity

Kerwin McCrimmon, K. (2015, Aug 27). Private Money Saves Colorado IUD Program as Fight Continues for Public Funding. Kaiser Health News. http://khn.org/news/private-money-saves-colorado-iud-program-as-fight-continues-for-public-funding/ See also Tavernise, S. (2015, Jul 5). Colorado’s Effort Against Teenage Pregnancies Is a Startling Success. The New York Times. http://www.nytimes.com/2015/07/06/science/colorados-push-against-teenage-pregnancies-is-a-startling-success.html

Tests Are Not Reliable Measures of Teacher Performance
American Education Research Association and National Academy of Education. Getting Teacher Evaluation Right: A Brief for Policymakers. https://edpolicy.stanford.edu/publications/pubs/421

American Statistical Association. (2014, Apr 8). ASA Statement on Using Value-Added Models for Educational Assessment. OpEd News. http://www.opednews.com/Quicklink/ASA-Statement-on-Using-Val-in-Best_Web_OpEds-Administration_Caution_Mandates_Teacher-140412-203.html

American Education Research Association. (2015, Nov). AERA Statement of Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs. Educational Researcher. http://online.sagepub.com/search/results?submit=yes&src=hw&andorexactfulltext=and&fulltext=AERA+Statement+of+Use+of+Value-Added+Models+%28VAM%29+for+the+Evaluation+of+Educators+and+Educator+Preparation+Programs&x=0&y=0

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012, Mar 15). Evaluating Teacher Evaluation: We Know About Value-Added Models and Other Methods. Phi Delta Kappan. http://www.edweek.org/ew/articles/2012/03/01/kappan_hammond.html

Lavigne, A. L., & Good, T. L. (2013). Teacher and Student Evaluation: Moving Beyond the Failure of School Reform. New York: Routledge.

Stigins, R. J. (2014). Defensible Teacher Evaluation: Student Growth Through Classroom Assessment. Thousand Oaks, CA: Corwin Press.

Ballou, D., & Springer, M. G. (2015). Using Student Test Scores to Measure Teacher Performance: Some Problems in the Design and Implementation of Evaluation Systems. Educational Researcher. http://edr.sagepub.com/content/44/2/77.full.pdf+html?ijkey=WSTBFIHcTyO9I&keytype=ref&siteid=spedr

Amrein-Beardsley, A. (n.d.). Top 15 Research Articles About VAMS. http://vamboozled.com/research-articles-on-vams/

Amrein-Beardsley, A. (n.d.). All Recommended Articles About VAMS. http://vamboozled.com/recommended-reading/value-added-models/

Shavelson, R. J., Linn, R. L., Baker, E. L., et al. (2010, Aug 27). Problems with the Use of Student Test Scores to Evaluate Teachers. Economic Policy Institute. http://www.epi.org/publication/bp278/

Yettick, H. (2014, May 13). Studies Highlight Complexities of Using Value-Added Measures. Education Week. http://www.edweek.org/ew/articles/2014/05/13/32value-add.h33.html

Haertel, E. H. (2013). Reliability and Validity of Inferences About Teachers Based on Student Test Scores. Educational Testing Service. http://www.ets.org/research/policy_research_reports/publications/publication/2013/jquq

Casey, L. M. (2013). The Will to Quantify: The “Bottom Line” in the Market Model of Education Reform. Teachers College Record. http://www.tcrecord.org/Content.asp?ContentId=17107

Amrein-Beardsley A., Collins, C., Holloway-Libell, J., & Paufler, N. (2016, Jan 5). Everything is Bigger (and Badder) in Texas: Houston’s Teacher Value-Added System. Teacher’s College Record. http://www.tcrecord.org/Content.asp?ContentId=18983

Lash, A., Makkonen, R., Tran, L., & Huang, M. (2016, Jan). Analysis of the Stability of Teacher-Level Growth Scores from The Student Growth Percentile Model. WestEd. https://relwest.wested.org/resources/210?utm_source=REL+West+Mailing+List&utm_campaign=05c37febff-ee-4-1&utm_medium=email&utm_term=0_316bfe94f7-05c37febff-92259833

Yeh, S. (2013). A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling. Teachers College Record. http://www.tcrecord.org/Content.asp?ContentID=16934

Amrein-Beardsley, A. (2015, Apr 30). The (Relentless) Will to Quantify. http://vamboozled.com/the-relentless-will-to-quantify/

Anrig, G. (2015, Mar 25). Value Subtracted: Gov. Cuomo’s Plot to Tie Teacher Evaluations to Test Scores Won’t Help Our Public Schools. Slate. http://www.slate.com/articles/life/education/2015/03/gov_andrew_cuomo_and_teacher_evaluations_standardized_test_scores_are_the.html

Berliner, D. C. (2015, Aug 11). Teacher Evaluation and Standardized Tests: A Policy Fiasco. Melbourne Graduate School of Education. http://education.unimelb.edu.au/news_and_activities/events/upcoming_events/dean_lecture_series/dls-past-2015/teacher-evaluation-and-standardised-tests-a-policy-fiasco

Evaluations Based on Test Scores Misidentify Teachers
Schochet, P. Z., &. Chiang, H. S. (2010, Jul). Technical Methods Report: Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains. National Center for Education Evaluation and Regional Assistance. https://ies.ed.gov/ncee/pubs/20104004/

Bitler, M. P., Corcoran, S. P., Domina, T., & Penner, E. K. (2014, Spring). Teacher Effects on Student Achievement and Height: A Cautionary Tale. The Society for Research on Educational Effectiveness. https://archive.org/stream/ERIC_ED562824/ERIC_ED562824_djvu.txt

Examples of Test-Based Evaluations That Fail Exemplary Teachers
Hirsch, M. (2012, Mar 1). The True Story of Pascale Mauclair. New Politics. http://newpol.org/content/true-story-pascale-mauclair

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012, Mar 15). Evaluating Teacher Evaluation. Phi Delta Kappan. http://www.edweek.org/ew/articles/2012/03/01/kappan_hammond.html

Ravitch, D. (2015, Aug 7). Bruce Lederman Explains the Challenge to New York State Teacher Evaluation System. http://dianeravitch.net/2015/08/07/bruce-lederman-explains-the-challenge-to-new-york-state-teacher-evaluation-system/

Harris, E.A. (2016, May 10). Court Vacates Long Island Teacher’s Evaluation Tied to Student Test Scores. The New York Times. http://www.nytimes.com/2016/05/11/nyregion/court-vacates-long-island-teachers-evaluation-tied-to-student-test-scores.html

Pallas, A. (2012, May 16). Meet the “Worst” 8th Grade Math Teacher in NYC. The Washington Post. http://www.washingtonpost.com/blogs/answer-sheet/post/meet-the-worst-8th-grade-math-teacher-in-nyc/2012/05/15/gIQArmlbSU_blog.html

Other Flaws in Test-Based Evaluation Systems
Whitehurst, G. J., Chingos, M. M., & Lindquist, K. M. (Winter 2015). Getting Classroom Observations Right. EducationNext. http://educationnext.org/getting-classroom-observations-right/ See also Kirby A. (2016, Jan 21). Study Finds Flaws in Teacher Performance Observations. https://www.cabinetreport.com/human-resources/study-finds-flaws-in-teacher-performance-observations

Amrein-Beardsley, A. (2015, Oct 8). Teacher Evaluation Systems “At Issue” Across U. S. Courts. http://vamboozled.com/teacher-evaluation-systems-at-issue-across-u-s-courts/

Paufler, N. A., & Amrein-Beardsley, A. A. (2013, Jul 25). The Random Assignment of Students into Elementary Classrooms: Implications for Value-Added Analyses and Interpretations. American Educational Research Journal. http://aer.sagepub.com/content/51/2/328.

Condie, S., Lefgren, L., & Sims, D. (2014, Jun). Teacher Heterogeneity, Value-Added and Education Policy. Economics of Education Review. http://www.sciencedirect.com/science/article/pii/S0272775713001647

Test-Based Evaluations Do Not Measure Good Teaching and Harm the Profession
Polikoff, M. S., & Porter, A. C. (2014, May). Instructional Alignment as a Measure of Teaching Quality. Educational Evaluation and Policy Analysis. http://www.aera.net/Newsroom/RecentAERAResearch/InstructionalAlignmentasaMeasureofTeachingQuality/tabid/15510/Default.aspx See also Barshay, J. (2014, May 13). Researchers Give Failing Marks to National Effort to Measure Good Teaching. http://educationbythenumbers.org/content/researchers-say-pennsylvanias-measurement-teacher-effectiveness-doesnt-measure-good-teaching_1238/ and Ravitch, D. (2016, Mar 16). John Thompson: The Utter Failure of Standardized Teacher Evaluation. http://dianeravitch.net/2016/03/16/johnthompson-the-utter-failure/

Johnson, S. M. (2015, Jul 29). Four Unintended Consequences of Using Student Test Scores to Evaluate Teachers. The Washington Post. http://www.washingtonpost.com/blogs/answer-sheet/wp/2015/07/29/four-unintended-consequences-of-using-student-test-scores-to-evaluate-teachers/

Kirby, A. (2015, Aug 27). High-Stakes Teacher Evaluations May Not Help. https://www.cabinetreport.com/human-resources/high-stakes-teacher-evaluations-may-not-help See also Bryant, Jeff (2016, April) We Won’t Improve Education by Making Teachers Hate Their Jobs. http://educationopportunitynetwork.org/we-wont-improve-education-by-making-teachers-hate-their-jobs/

Kwalwasser, H. (2015, Sep 15). Standardized Tests Don’t Help Us Evaluate Teachers. Los Angeles Times. http://www.latimes.com/opinion/op-ed/la-oe-0910-kwalwasser-standardized-testing-problems-20150910-story.html

Goldstein, D. (2014). The Teacher Wars: A History of America’s Most Embattled Profession. New York: Doubleday.

Amrein-Beardsley, A. (2015. Dec 29). VAMboozled!: Why Standardized Tests Should Not Be Used to Evaluate Teachers (and Teacher Education Programs). http://nepc.colorado.edu/blog/why-standardized-tests

Toch, T. (2016, May). Grading the Graders: A Report on Teacher Evaluation Reform in Public Education. Center on the Future of American Education. https://georgetown.app.box.com/s/f47qnfh63wfxhxqu88pu5r0y0tkbo6bk

Feintzeig, R. (2015, Apr 21). The Trouble with Grading Employees. The Wall Street Journal. http://www.wsj.com/articles/the-trouble-with-grading-employees-1429624897 See also Korkki, P. (2015, Jul 11). Why Employee Ranking Can Backfire. The New York Times. http://mobile.nytimes.com/2015/07/12/business/why-employee-ranking-can-backfire.html?_r=1&referrer

Ravitch, J. (2015, Oct 21). John Thompson: The Gates Plan Failed in Tulsa, Now What? http://dianeravitch.net/2015/10/21/john-thompson-the-gates-plan-failed-in-tulsa-now-what/

Kirby, A. (2015, Nov 9). Michigan Bill Rolls Back Test Scores in Teacher Evaluations. https://cabinetreport.com/politics-education/michigan-bill-rolls-back-test-scores-in-teacher-evaluations

Amrein-Beardsley, A. (2015, Nov 9). Houston Board Candidates Respond to Their Teacher Evaluation System. http://vamboozled.com/?s=Houston+board+candidates&submit=Search&__bcf_gupi=1DCE61EDFC3F0001C87B1A304D9B1E821DCE61EDFC4000013A7E99739110F630

Gandha, T. (2016, Feb). State Actions to Advance Teacher Evaluation. Southern Regional Education Board. http://www.sreb.org/publication/state-actions-advance-teacher-evaluation

Tom Torlakson’s Task Force. (2012, Sep). Greatness by Design; Supporting Outstanding Teaching to Sustain a Golden State. http://www.cde.ca.gov/eo/in/ee.asp

Danielson, C. (2016, Apr 18). Charlotte Danielson on Rethinking Teacher Evaluation. Education Week. http://www.edweek.org/ew/articles/2016/04/20/charlotte-danielson-on-rethinking-teacher-evaluation.html?cmp=eml-enl-eu-news2-RM

Taylor, K. (2015, Nov 25). Cuomo, in Shift, Is Said to Back Reducing Test Scores’ Role in Teacher Reviews. The New York Times. http://www.nytimes.com/2015/11/26/nyregion/cuomo-in-shift-is-said-to-back-reducing-test-scores-role-in-teacher-reviews.html?ref=topics&_r=0

Disare, M. (2015, Dec 14). In Big Shift, Regents Vote to Exclude State Tests from Teacher Evals Until 2019. http://ny.chalkbeat.org/2015/12/14/breaking-in-big-shift-regents-vote-to-exclude-state-tests-from-teacher-evals-until-2019/?utm_source=Master+Mailing+List&utm_campaign=f54d1b9f78-Rise_Shine_201912_15_2015&utm_medium=email&utm_term=0_23e3b96952-f54d1b9f78-75668293#.VnA8EI-cE2y

Tyrrell, J. (2015, Nov 21). Nassau Superintendents: End Teacher Evals Tied to Test Scores. Newsday. http://www.newsday.com/long-island/nassau/nassau-superintendents-end-teacher-evals-tied-to-test-scores-1.11150791

Layton, L. (2015, Nov 16). Clinton Says “No Evidence” That Teachers Can Be Judged by Student Test Scores. The Washington Post. https://www.washingtonpost.com/local/education/clinton-says-no-evidence-that-teachers-can-be-judged-by-student-test-scores/2015/11/16/303ee068-8c98-11e5-baf4-bdf37355da0c_story.html

Ravitch, D. (2015, Dec 17). John Thompson: The Beginning of the End of VAM? http://dianeravitch.net/2015/12/17/john-thompson-the-beginning-of-the-end-of-vam/

Sawchuk, S. (2013, Apr 4). Bill Gates: Don’t Overuse Tests in Teachers’ Evaluations. http://blogs.edweek.org/edweek/teacherbeat/2013/04/bill_gates_dont_overuse_tests_in_teachers_evaluations.html See also Layton, L. (2015, Oct 7). Improving U.S. schools Tougher than Global Health, Gates Says. The Washington Post. https://www.washingtonpost.com/local/education/improving-us-schools-tougher-than-global-health-gates-says/2015/10/07/56da9972-6d05-11e5-b31c-d80d62b53e28_story.html

A More Effective Approach to Teacher Evaluation
Quintero, E (2016, Feb 23). Beyond Teacher Quality. http://www.shankerinstitute.org/blog/beyond-teacher-quality

Johnson, S. M. (2015, Jun 25). Will Value-Added Reinforce the Walls of the Egg-Crate School? http://www.shankerinstitute.org/blog/will-value-added-reinforce-walls-egg-crate-school

Kini, T., & Podolsky, A. (2016). Does Teaching Experience Increase Teacher Effectiveness? A Review of the Research. Learning Policy Institute. https://learningpolicyinstitute.org/our-work/publications-resources/does-teaching-experience-increase-teacher-effectiveness-review-research/

Public Impact. (2015). Evaluation, Accountability, and Professional Development in an Opportunity Culture. Opportunity Culture. http://opportunityculture.org/evaluation-policy-brief/

Lavigne, A. L., & Good, T. L. (2015). Improving Teachers Through Observation and Feedback: Beyond State and Federal Mandates. New York: Routledge.

Network for Public Education. (2016). Teachers Talk Back: Educators on the Impact of Teacher Evaluation. http://networkforpubliceducation.org/2016/04/6468/

Does Dismissing Incompetent Teachers Improve Student Outcomes?
Kraft, M.A., & Gilmour, A.F. (2016, Feb). Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness. Brown University. http://scholar.harvard.edu/mkraft/publications/revisiting-widget-effect-teacher-evaluation-reforms-and-distribution-teacher

Hess, R. (2016, Mar 8). When Fancy New Teacher-Evaluation Systems Don’t Make a Difference. http://mobile.edweek.org/c.jsp?cid=25920011&item=http%3A%2F%2Fapi.edweek.org%2Fv1%2Fblog%2F76%2F%3Fuuid%3D57146

Lowrey, A. (2012, Jan 6). Big Study Links Good Teachers to Lasting Gain. The New York Times. http://www.nytimes.com/2012/01/06/education/big-study-links-good-teachers-to-lasting-gain.html?_r=0

Ravitch, D. (2014, Aug 11). The Holes in the Chetty et al VAM Study as Seen by the American Statistical Association. http://dianeravitch.net/2014/08/11/the-holes-in-the-chetty-et-al-vam-study-as-seen-by-the-american-statistical-association/

Rothstein, J., & Mathis, W. J. (2013, Jan 31). Review of Two Culminating Reports from the MET Project. National Education Policy Center. http://nepc.colorado.edu/thinktank/review-MET-final-2013

DeWitt, P. (2015, May 11). 3 Reasons Why Your Observations May Be a Waste of Time. http://blogs.edweek.org/edweek/finding_common_ground/2015/05/3_reasons_why_your_observations_may_be_a_waste_of_time.html

WestEd. (2015, Sep). Video: Making Meaningful Use of Teacher Effectiveness Data. https://relwest.wested.org/resources/198

Can Evaluations by Principals Fix the Problems of Test-Based Accountability
Kirby, A. (2016, Jan 21). Study Finds Flaws in Teacher Performance Observations. https://www.cabinetreport.com/human-resources/study-finds-flaws-in-teacher-performance-observations

Hallinger, P., Heck, R. H., & Murphy, J. (2013, Jul 30). Leading via Teacher Evaluation: The Case of the Missing Clothes? Educational Researcher. http://ecs.force.com/studies/rstudypg?id=a0r70000003ql6SAAQ

American Educational Research Association. (2016, Mar). Educational Evaluation and Policy Analysis. http://eepa.aera.net See also Di Carlo. M. (2015, Feb 25). Student Sorting and Teacher Classroom Observations. http://www.shankerinstitute.org/blog/student-sorting-and-teacher-classroom-observations and Garret, R., & Steinberg, M. P. (2015, May 21). Examining Teacher Effectiveness Using Classroom Observation Scores. http://epa.sagepub.com/content/early/2014/06/13/0162373714537551

Lavigne, A. L., & Good, T. L. (2015). Improving Teaching Through Observation and Feedback: Beyond State and Federal Mandates. New York: Routledge.

Devaney, L. (2016, Jan 19). Classroom Observations May Hurt Teachers More Than They Help, Study Says. eSchool News. http://www.eschoolnews.com/2016/01/19/classroom-observations-may-hurt-teachers-more-than-they-help-study-says/

DiCarlo, M. (2015, Dec 4). Evidence from a Teacher Evaluation Pilot Program in Chicago. http://www.shankerinstitute.org/blog/evidence-teacher-evaluation-pilot-program-chicago

DeWitt, P. (2016, Apr 19). The Myth of Walkthroughs: 8 Unobserved Practices in Classrooms. http://blogs.edweek.org/edweek/finding_common_ground/2016/04/the_myth_of_walkthroughs_8_unobserved_practices_in_classrooms.html

A Narrow Focus on Dismissing Teachers Detracts from Effective Improvement Measures
American Education Research Association and National Academy of Education. Getting Teacher Evaluation Right: A Brief for Policymakers. https://edpolicy.stanford.edu/publications/pubs/421

Thompson, J. (2015, Sep 10). The Rhino in the Room: Time to End Disruptive Reform. http://www.livingindialogue.com/the-rhino-in-the-room-time-to-end-disruptive-reform/

The New Teacher Project. (2013, Jul 30). The Irreplaceables: Understanding the Real Retention Crisis in America’s Urban Schools. http://tntp.org/publications/view/retention-and-school-culture/the-irreplaceables-understanding-the-real-retention-crisis

Knudson, J. (2013, Sep). You’ll Never Be Better Than Your Teachers: The Garden Grove Approach to Human Capital Development. http://eric.ed.gov/?q=source%3a%22California+Collaborative+on+District+Reform%22&id=ED557950 See also Tucker, M. (2016, Apr). How to Get a First-Rate Teacher in Front of Every Student. http://blogs.edweek.org/edweek/top_performers/2016/04/how_to_get_a_first-rate_teacher_in_front_of_every_student.html?utm_source=feedblitz&utm_medium=FeedBlitzRss&utm_campaign=top_performers

Sawhill, I. V. (2015, Sep 8). Does Money Matter? http://www.brookings.edu/research/opinions/2015/09/08-does-money-matter-education-sawhill See also Jackson, C. K., Johnson, R. C., & Persico, C. (2015, Fall). Boosting Educational Attainment and Adult Earnings. http://educationnext.org/boosting-education-attainment-adult-earnings-schoolspending/

Ravitch, D. (2015, Oct 20). Indiana: Less Money, More Chaos. http://dianeravitch.net/2015/10/20/indiana-less-money-more-chaos/

Baker, B. (2012). Revisiting That Age Old Question: Does Money Matter in Education? http://eric.ed.gov/?q=Does+Money+Matter+in+Education&id=ED528632 See also Baker, B. (2016). Does Money Matter in Education? Second Edition. http://www.shankerinstitute.org/resource/does-money-matter and Spielberg, B. (2015, Oct 20). The Truth About School Funding. http://34justice.com/2015/10/20/the-truth-about-school-funding/

Leachman, M., Albares, N., Masterson, K., & Wallace, M. (2016, Jan 25). Most States Have Cut School Funding, and Some Continue Cutting. http://www.cbpp.org/research/state-budget-and-tax/most-states-have-cut-school-funding-and-some-continue-cutting

Kane, T. J., Owens, A. M., Marinell, W. H., Thal, D. R. C., & Staiger, D. O. (2016, Feb). Teaching Higher: Educators’ Perspectives on Common Core Implementation. http://cepr.harvard.edu/teaching-higher See also Hull, S. J. (2015, Oct 14). Principals Matter—And They Need the Right Start. http://www.learningfirst.org/principals-matter-and-they-need-right-start?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+LFA+%28Public+School+Insights%3A+What+is+WORKING+in+our+Public+Schools%29 and Center for Education Policy Research. (2014–16). Teaching Higher: Educator’s Perspectives on Common Core Implementation. http://cepr.harvard.edu/teaching-higher

Tucker, M. (2015, Aug 13). Organizations in Which Teachers Can Do Their Best Work: Part I. http://blogs.edweek.org/edweek/top_performers/2015/08/organizations_in_which_teachers_can_do_their_best_work_part_i.html

Tucker, M. (2015, Aug 20). Organizations in Which Teachers Can Do Their Best Work: Part II. http://blogs.edweek.org/edweek/top_performers/2015/08/organizations_in_which_teachers_can_do_their_best_work_part_ii.html

The Wallace Foundation. (2013, Jan). The School Principal as Leader: Guiding Schools to Better Teaching and Learning. http://www.wallacefoundation.org/knowledge-center/Pages/The-School-Principal-as-Leader-Guiding-Schools-to-Better-Teaching-and-Learning.aspx

Superville, D. R. (2015, Oct 23). New Professional Standards for School Leaders Are Approved. http://blogs.edweek.org/edweek/District_Dossier/2015/10/new_professional_standards_for.html?r=608789257

Targeting the Lowest-Performing Schools with Closure and Other Drastic Measures Is Usually Ineffective
Miller, T. D., & Brown, C. (2015, Mar 31). Dramatic Action, Dramatic Improvement. Center for American Progress. http://nepc.colorado.edu/thinktank/review-school-turnaround See also Burris, C. (2015, Sep 4). School Closures: A National Look at a Failed Strategy. http://www.networkforpubliceducation.org/2015/09/school-closures-a-national-look-at-a-failed-strategy-2/?can_id=012f354d90b87664b362dda6a4b2980d&source=email-school-closures-a-national-look-at-a-failed-strategy&email_referrer=school-closures-a-national-look-at-a-failed-strategy and American Institutes for Research and Mathematica Policy Research. (May 2015). Evaluation Brief: State Capacity to Support School Turnaround. National Center for Education Evaluation and Regional Assistance. https://ies.ed.gov/ncee/pubs/20154012/

Trujillo, T., & Renée, M. (2012, Oct 1). Democratic School Turnarounds: Pursuing Equity and Learning from Evidence. National Education Policy Center. http://nepc.colorado.edu/publication/democratic-school-turnarounds

Aragon, S., & Workman, E. (2015, Oct). Emerging State Turnaround Strategies. Education Commission of the States. http://www.ecs.org/press-release-emerging-state-turnaround-strategies/ See also Felton, E. (2015, Oct 19). Are Turnaround Districts the Answer for America’s Worst Schools? http://hechingerreport.org/are-turnaround-districts-the-answer-for-americas-worst-schools/

Ehrenhalt, A. (2013, Oct). School Scandals Reveal the Problem with Grading Schools. Governing. http://www.governing.com/columns/col-school-scandals-reveal-testing-ignorance.html

Di Carlo, M. (2015, Jun). The Evidence on the “Florida Formula” for Education Reform. Albert Shanker Institute. http://www.shankerinstitute.org/resource/evidence-florida-formula-education-reform

Sparks, S. D. (2015, Oct 6). Studies Probe How Schools Widen Achievement Gaps. Education Week. http://www.edweek.org/ew/articles/2015/10/07/schools-help-widen-academic-gaps-studies-find.html?r=258221469&cmp=eml-enl-eu-news1-RM

Klein, A. (2014, Sep 15). New Turnaround Options Detailed in Draft SIG Guidance. Education Week. http://www.edweek.org/ew/articles/2014/09/17/04sig.h34.html

Fullan, M. (2011, Nov 17). Choosing the Wrong Drivers for Whole System Reform. http://education.qld.gov.au/projects/educationviews/news-views/2011/nov/talking-point-fullan-101117.html

Fiske, E. B., & Ladd, H. F. (2016, Feb 13). Learning from London About School Improvement. The News & Observer. http://www.newsobserver.com/opinion/op-ed/article60118256.html

Building Better Schools

Bill Honig– positive support of public schools

Why Conventional School “Reforms” Have Failed
Teacher and School Evaluations Are Based on Test Scores

Firing Teachers Based on Students’ Test Scores Is Not the Answer

Teacher Quality: Putting the Issue in Perspective

Tests Are Not Reliable Measures of Teacher Performance

Evaluations Based on Test Scores Misidentify Teachers

Examples of Test-Based Evaluations That Fail Exemplary Teachers

Other Flaws in Test-Based Evaluation Systems

Test-Based Evaluations Do Not Measure Good Teaching and Harm the Profession

A More Effective Approach to Teacher Evaluation

Does Dismissing Incompetent Teachers Improve Student Outcomes?

Can Evaluations by Principals Fix the Problems of Test-Based Accountability?

A Narrow Focus on Dismissing Teachers Detracts from Effective Improvement Measures

Targeting the Lowest-Performing Schools with Closure and Other Drastic Measures Is Usually Ineffective

Recent Developments

BBS Companion Articles

Reference Notes

Leave a Reply

Why Conventional School “Reforms” Have Failed Teacher and School Evaluations Are Based on Test Scores

Firing Teachers Based on Students’ Test Scores Is Not the Answer

Teacher Quality: Putting the Issue in Perspective

Tests Are Not Reliable Measures of Teacher Performance

Evaluations Based on Test Scores Misidentify Teachers

Examples of Test-Based Evaluations That Fail Exemplary Teachers

Other Flaws in Test-Based Evaluation Systems

Test-Based Evaluations Do Not Measure Good Teaching and Harm the Profession

A More Effective Approach to Teacher Evaluation

Does Dismissing Incompetent Teachers Improve Student Outcomes?

Can Evaluations by Principals Fix the Problems of Test-Based Accountability?

A Narrow Focus on Dismissing Teachers Detracts from Effective Improvement Measures

Targeting the Lowest-Performing Schools with Closure and Other Drastic Measures Is Usually Ineffective

Recent Developments

BBS Companion Articles

Reference Notes

Leave a Reply

Why Conventional School “Reforms” Have Failed
Teacher and School Evaluations Are Based on Test Scores