Tag Archives: School Evaluation

How Top Performers Build-and-Support: Exemplary Models

Print Friendly, PDF & Email

How Top Performers Build-and-Support
Exemplary Models

by Bill Honig

Build-and-Support strategies not only have been based on extensive research but have proved to significantly improve performance in those districts, states and provinces, and nations that have followed their ideas.

School Districts

There are examples of stellar districts that have achieved successful results by following Build-and-Support ideas. These include Long Beach, Garden Grove, Sanger, Whittier High School, Elk Grove, the High-Tech High School Summit, and the Aspire charter school networks, all in California; Montgomery County, Maryland; and Union City, New Jersey. All have pursued this more comprehensive, positive approach for years and place in the top ranks of international assessments. Conversely, Dallas, Texas, and Newark, New Jersey, are examples of the damage caused by a full “reform” strategy and its failure to produce results.

Sanger’s journey—from a low-performing, high-poverty district suffering from substantial labor strife to a high-performing district where teachers and administrators have forged a close working relationship—demonstrates the power of the Build-and-Support strategy. Ironically, as a prime example of the deleterious effect of federal policy, in 2014 Sanger accepted a federal waiver under duress to avoid the severe penalties of NCLB (imposed by the feds even though Sanger grew faster than almost every other district in the state). However, district leaders then became worried that forced implementation of a test-driven evaluation would reverse its successful collaboration efforts. The problem should be solved in 2016 when the new ESSA measure becomes operative and when required high-stakes evaluation of teachers can no longer be mandated.

Similarly, Long Beach Unified School District, identified as one of the three top school jurisdictions in the country and among the top 20 in the world, has been building professional capacity around a strong, core curriculum for several decades with significant results. According to its superintendent, Chris Steinhauser, Long Beach’s success stems from its attention to human and social capital development, including clinical experiences for new teachers; treating educators, parents, and community members with respect and trust; providing extensive coaching support for teachers and principals; orienting the district administrators to support schools; building teams at schools; implementing a strong liberal arts curriculum with a districtwide focus; developing cooperation with colleges and community organizations; and continuing a shared focus by all on instructional and curricular quality. Again, Long Beach has had consistent leadership for the past two decades under Carl Cohn (1992–2002) and Superintendent Steinhauser (2002–present). Long Beach has pursued educational improvement by developing a districtwide strategy that engages all teachers and schools in the effort as opposed to a punitive approach aimed at the lowest-performing schools. For why this is important, see Fiske and Ladd’s comments. Finally, Long Beach has struck the right balance between school and teacher autonomy and district leadership, which is crucial in allowing each school to implement improvement efforts in its own way while adhering to an overall district strategy. For a perceptive article on this issue, see Larry Cuban’s blog.

Another example is Garden Grove, which has one of the largest percentage of English-language learners in large districts in California yet has improved performance substantially in the last 15 years. Under the exemplary leadership of Laura Schwalm, superintendent from 1999 to 2013, and Gabriela Mafi since 2013, the district, among other Build-and-Support measures, has developed a robust human resources development program with two aspects. First, the district finds and keeps the best teachers by developing effective systems of recruiting, proper placement, inducting, granting tenure, and compensation. Second, it builds the capacity of current staff by comprehensive professional development, creates effective school site teams, and offers career advancement pathways that allow our best teachers a hybrid teaching and leadership role and the possibility of higher earnings.

These successful jurisdictions don’t ignore accountability. But effective accountability must not rely solely or primarily on test scores. It should be designed around providing useful, timely feedback that will assist school, district, and local community efforts in improving instruction and student performance. And it should assiduously avoid causing the type of extensive collateral damage we have seen under high-stakes testing: narrowing the curriculum, discouraging cooperation, and emphasizing looking good on tests rather than providing quality instruction.

This more supportive philosophy guides the accountability system being developed in California and many other states. The state will be establishing an integrated hybrid of state and local indicators such as graduation rates, college preparation, career preparation, passing advanced placement courses, curriculum breadth and depth, student and teacher engagement, school climate, student suspensions or teacher absences, reclassification rates for English-language learners, and implementation and team-building efforts. The main locus of accountability is the school and district with local community participation, under the assumption and trust that the professionals in the school, not the federal government or the state, will be the driving force for improvement if they have the support they need. For an up-to-date report on these broader accountability ideas, see a 2016 paper by Linda Darling-Hammond and colleagues, Pathways to New Accountability Through the Every Student Succeeds Act. In addition, see a 2016 report by Cook-Harvey and Stosich of the Stanford Learning Policy Institute, Redesigning School Accountability and Support: Progress in Pioneering States.

Data based on reasonable student testing and just-in-time student assessment are helpful when such data provide information back to the teachers, schools, and local communities to assist their continuous improvement efforts. California is a member of the Smarter Balanced Assessment Consortium (SBAC) and administered the first state assessments in 2015. However, results won’t be used for accountability purposes until enough data are available for growth measures and potential targets can be validated. The state also wants to give teachers a chance to implement the curricular changes envisioned by the Common Core State Standards (CCSS). However, as mentioned above, these end-of-year, broad-scale tests should be only one part of a broader accountability system and need to be combined with more sophisticated, accurate, and authentic measures of student performance such as end-of-course and periodic assessments, passing competency-based measures such as certificates, performances, portfolios, and projects.

Furthermore, state and district policy should recognize that negative fallout from testing is minimized if tests are not used primarily for formal, high-stakes teacher or school evaluations or to assess school progress toward impossible goals established by political entities that are far removed from the facts on the ground. Test results are most useful when viewed as one aspect of the main driver of improvement—a broad, collaborative, well-resourced effort to improve school, student, and teacher performance over the long haul.

There will be schools that struggle and need assistance. Site visits and support need to be organized, as envisioned by the new California Collaborative for Educational Excellence. The group will offer help, support, and site visits to struggling schools. For a national proposal along these lines, see Marc Tucker’s blog post “ESEA Reauthorization and Accountability: A Chance to Do It Right.”

Successful jurisdictions do not neglect the problem of incompetent teachers. It turns out that giving low-performing teachers a chance to improve is more effective when the efforts are part of a cooperative endeavor to improve instruction. First, many low-performing teachers will improve with helpful support. Second, low performers cannot easily hide in their classrooms if a concerted team effort is under way. For many, the exposure pushes them to improve or resign. California districts such as Long Beach, San Jose, and Garden Grove, as well as places such as Montgomery County, Maryland, and Massachusetts, are examples of jurisdictions that have embedded teacher evaluations in a broader instructional improvement effort, obtained union and teacher support, and used peer review techniques. They have found that this approach has proved more successful in dismissing or counseling out the worst teachers who cannot or will not improve, with considerably less collateral damage than the traditional method that relies entirely on a negative, high-pressure strategy.

A 2016 Aspen Institute report, Teacher Evaluation and Support Systems: A Roadmap for Improvement, chronicles exemplary practices in the nation exemplifying this more supportive approach.

Nations and States

What have the most successful nations and states done to improve student performance?

On the world stage, high-performing Finland had a mediocre system two decades ago. It initiated a long-term positive engagement strategy and revitalization of the teaching force and now substantially outscores Norway, which has a similar population and demographics but is stuck in a test-driven accountability mode. Finnish Lessons 2.0: What Can the World Learn from Educational Change in Finland? is one of the best books on the topic. The author is Pasi Sahlberg, one of the primary leaders of the reforms.

William Doyle spent a year on a Fulbright scholarship studying the Finnish success story. He writes of a fantastic school in rural Finland and conversations with one of its top teacher educators. He contrasts the Finnish attention to revitalizing the teaching profession to the prevailing conventional “reform” strategy in this country:

[I]n the U.S., instead of control, competition, stress, standardized testing, screen-based schools and loosened teacher qualifications, try warmth, collaboration, and highly professionalized, teacher-led encouragement and assessment.

I should note, however, that Finland has stalled or declined in recent Program for International Student Assessment (PISA) and Trends in Mathematics and Science Study (TIMSS) tests. For a contrary view of Finland’s rise and recent stall or decline, see The Real Finnish Lessons: The True Story of an Education Superpower. The author attributes Finland’s past successes not to its education initiatives, but to the prominence teachers always enjoyed in that country as nation builders, the determination of families stemming from Finland’s recent industrialization, and traditional teaching methods. The author further argues that the abatement of these factors is causing Finland’s test results to decline. This report was prepared by a conservative think tank in England created by Margaret Thatcher, comparable to our Hoover Institution. The author doesn’t think much of student or teacher collaboration. But there has been a raft of studies showing that collaboration among teachers and improving social capital and the prestige of the profession do make a significant difference. It will be interesting to see the analysis of this contrarian position.

In Canada, the province of Ontario has followed the same successful trajectory—revitalizing the teaching profession, creating effective professional learning communities at each school around teaching a vigorous curriculum, and using the capacity-building approach. The result was a substantial improvement in student performance. Poland has undergone a similar transformation using team building and continuous improvement strategies to boost performance. Also, Poland has chalked up enviable progress, as described in Amanda Ripley’s book The Smartest Kids in the World: And How They Got That Way. (Ripley visited three foreign countries for examples of world-class educational efforts—it’s a shame she didn’t visit comparable examplesin the US, for example, Massachusetts.) Many Asian countries such as Japan, Korea, and Singapore and the city of Shanghai are among the highest performers in the world. All have been implementing continuous improvement strategies for decades. See, for example, Developing Shanghai’s Teachers. On the flip side, Chile and Sweden adopted wholesale charter and voucher approaches and suffered severe negative consequences.

There are many success stories closer to home, but, unfortunately, they are the exception not the rule. Massachusetts is a poster child for why Build-and Support works. Over the past 20 years, the state has consistently pursued the comprehensive positive approach engaging, not vilifying, educators. It placed instruction at the core of its reforms, built capacity around improving classrooms and schools, upgraded the quality of the teaching force, and substantially increased funding. The Commonwealth carefully avoided most of the extreme reform approaches such as widespread charterization, attacking unions and weakening due process protections, and adopting punitive measures. Most importantly, Massachusetts has stayed the course for nearly two decades.

Specifically, in 1993 under the leadership of Commissioner of Education David Driscoll, the Bay State approved standards and curricular frameworks, developed an assessment system geared toward instructional improvement based on those standards and frameworks, organized professional development around the documents, raised requirements for graduation, installed rigorous charter school evaluations for approval, and initiated more stringent requirements and support for incoming teachers. Policymakers in Massachusetts also insisted that teachers earn a master’s degree over the course of their careers. (For a comparison with Finnish initiatives, see Lisa Hansel’s post “Seeking Confirmation” on the Core Knowledge blog.)

As a result, Massachusetts scores number one in our national NAEP scores by a wide margin. In international assessments it ranks right near the top in math and science, and at the top in mathematics in growth and performance level. Yes, it is home to numerous universities with high-level candidates who pursue teaching careers, a well-educated population, and a history of educational excellence, but such benefits aren’t enough to explain its phenomenal world-class performance. Why the Massachusetts model has not become the guide for national and other states’ improvement efforts, as Marshall Smith suggested several years ago, is bewildering.

Reference Notes

School Districts
Ravitch, D. (2015, Jun 23). Mike Miles Resigns as Dallas Superintendent. http://dianeravitch.net/2015/06/23/breaking-news-mike-miles-resigns-as-dallas-superintendent/

David, J. L., & Talbert, J. E. (2012). Turning Around a High-Poverty School District: Learning from Sanger Unified’s Success. http://web.stanford.edu/group/suse-crc/cgi-bin/drupal/publications/report

Amadolare, S. (2014, Feb 27). Which Is Worse? A California District Makes a Tough Choice Between No Child Left Behind and Obama Education Policies. http://hechingerreport.org/which-is-worse-a-california-district-makes-a-tough-choice-between-no-child-left-behind-and-obama-education-policies/

Long Beach Unified School District. About Long Beach Unified School District. http://www.lbschools.net/District/

Mongeau, L. (2016, Feb 2). How One California City Saved Its Schools. http://hechingerreport.org/how-one-california-city-saved-its-schools/

Steinhauser, C. (2015). Personal conversation with author. See also Freedberg, L. (2016, Feb 22). State Must Adopt Guidelines for Parent Engagement in Schools. http://edsource.org/2016/report-state-must-adopt-guidelines-for-parent-engagement-in-schools/95124?utm_source=Feb.+23+daily+digest+–+Michael&utm_campaign=Daily+email&utm_medium=email

Fiske, E. B., & Ladd, H. F. (2016, Feb 13). Learning from London About School Improvement. The News & Observer. http://www.newsobserver.com/opinion/op-ed/article60118256.html

Cuban, L. (2016, Feb 17). Reflecting on School Reforms: Scaling Up versus Short, Happy Life or Hanging In. https://larrycuban.wordpress.com/2016/02/17/reflecting-on-school-reforms-scaling-up-versus-short-happy-life-or-hanging-in/

Knudsen, J. (2013, Sep). You’ll Never Be Better Than Your Teachers: The Garden Grove Approach to Human Capital Development. http://www.cacollaborative.org/publications

Darling-Hammond, L., Bae, S., Cook-Harvey, C.M., Lam, L., Mercer, C., Podolsky, A., & Stosich, E. (2016, Apr). Pathways to New Accountability Through the Every Student Succeeds Act. Learning Policy Institute. https://learningpolicyinstitute.org/our-work/publications-resources/pathways-new-accountability-every-student-succeeds-act/

Cook-Harvey, C. M., & Stosich E. L. (2016, Apr 5). Redesigning School Accountability and Support: Progress in Pioneering States. Stanford Center for Opportunity Policy in Education. https://edpolicy.stanford.edu/publications/pubs/1406

Tucker, M. (2015, Dec 3). ESEA Reauthorization and Accountability: A Chance to Do It Right. http://blogs.edweek.org/edweek/top_performers/2015/12/esea_reauthorization_and_accountability_a_chance_to_do_it_right.html

Brown, C., Partelow, L., & Konoske-Graf, A. (2016, Mar 16). Educator Evaluation: A Case Study of Massachusetts’ Approach. https://www.americanprogress.org/issues/education/report/2016/03/16/133038/educator-evaluation/

Thompson, J. (2015, Mar 30). John Thompson: A Teacher Proposes a Different Framework for Accountability. https://educationpost.org/john-thompson-a-teacher-proposes-a-different-framework-for-accountability/

The Aspen Institute. (2016, Mar). Teacher Evaluation and Support Systems: A Roadmap for Improvement. http://www.aspendrl.org/

Nations and States
Hancock, L. (2011, Sep). Why Are Finland’s Schools Successful? Smithsonian Magazine. http://www.smithsonianmag.com/innovation/why-are-finlands-schools-successful-49859555/?no-ist=

Sahlberg, P. (2015). Finnish Lessons 2.0: What Can the World Learn from Educational Change in Finland? New York: Teacher’s College Press.

Doyle, W. (2016, Feb 18). How Finland Broke Every Rule—and Created a Top School System. http://hechingerreport.org/how-finland-broke-every-rule-and-created-a-top-school-system/

Sahlgren, G. H. (2015, Apr). Real Finnish Lessons: The True Story of an Education Superpower. Centre for Policy Studies. http://www.cps.org.uk/publications/reports/real-finnish-lessons-the-true-story-of-an-education-superpower/

Ripley, A. (2014). The Smartest Kids in the World: And How They Got That Way. New York: Simon & Schuster.

Tucker, M. (2016, Feb 29). Asian Countries Take the U.S. to School. The Atlantic. http://www.theatlantic.com/education/archive/2016/02/us-asia-education-differences/471564/

Zhang, M., Ding, X., & Xu, J. (2016, Jan). Developing Shanghai’s Teachers. http://www.ncee.org/developing-shanghais-teachers/

Alliance for Excellent Education. David Driscoll. http://all4ed.org/people/david-driscoll/

Chang, K. (2013, Sep 2). Expecting the Best Yields Results in Massachusetts. The New York Times. http://www.nytimes.com/2013/09/03/science/expecting-the-best-yields-results-in-massachusetts.html?pagewanted=all&_r=0 See also Khadaroo, S. T. (2012, Sep 5). Is Top-Ranked Massachusetts Messing with Education Success? The Christian Science Monitor. http://www.csmonitor.com/USA/Education/2012/0905/Is-top-ranked-Massachusetts-messing-with-education-success

Hansel, L. (2015, Jul 9). Seeking Confirmation. http://blog.coreknowledge.org/2015/07/09/seeking-confirmation/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+TheCoreKnowledgeBlog+%28The+Core+Knowledge+Blog%29

Carnoy, M., García, E., & Khavenson, T. (2015, Oct 30). Bringing It Back Home: Why State Comparisons Are More Useful Than International Comparisons for Improving U.S. Education Policy. Economic Policy Institute. http://www.epi.org/publication/bringing-it-back-home-why-state-comparisons-are-more-useful-than-international-comparisons-for-improving-u-s-education-policy/

Why Conventional School “Reforms” Have Failed: Teacher and School Evaluations Are Based on Test Scores

Print Friendly, PDF & Email

Why Conventional School “Reforms” Have Failed
Teacher and School Evaluations Are Based on Test Scores

by Bill Honig

The reform movement has failed to produce results overall, and reputable evaluations have shown that individual reform measures also proved to be ineffective. Turnaround schools, charter schools, merit pay, or test-based school and teacher accountability have had either nonexistent or trivial effects. In his book Visible Learning, John Hattie writes that even when reforms produced small gains, they fall far below the improvements brought about by validated initiatives. In this article, I examine the failure of one of the major initiatives of the reform movement: high-stakes teacher and school evaluations based on student test scores.

Firing Teachers Based on Students’ Test Scores Is Not the Answer

A major problem with the “reform” strategy is its tremendous overemphasis on removing incompetent teachers based on students’ test performance and enshrining mass firings as a key objective in school improvement efforts. For those who are seeking a “simple” way to improve educational outcomes, this approach has broad superficial appeal. Up until the repeal of No Child Left Behind (NCLB) and the passage of the Every Student Succeeds Act (ESSA) in late 2015, test-based accountability of teachers was a key component of the Obama administration’s educational policy and the price for relief—in the form of waivers—from arbitrary federal requirements. ESSA eliminates a national teacher evaluation system based on standardized tests scores and the federal government’s ability to grant waivers.

Incompetent teachers should be let go if, and only if, credible and fair methods are used. But personnel changes must be part of a broader push for instructional improvements and efforts to raise the performance of all staff—measures that produce much higher effects on student achievement. For examples of these more positive measures, see the Aspen Institute March 2016 report Evaluation and Support Systems: A Roadmap for Improvement.

Up until several years ago, the reform agenda had primarily relied on test-driven, high-stakes accountability systems to punish or reward schools—a questionable enough approach, as discussed later in this article. A recent shift compounded the error when districts and states began to use the tests to also evaluate teachers and administrators and mete out punishment, termination, or rewards.

Often accompanied by hostile, anti-teacher rhetoric, teacher evaluation systems based on test scores became a central plank in the reform movement. That is why some “reformers” have virulently campaigned against teachers’ unions and due process (tenure) rights for teachers. They see these protections (which should be streamlined when they become too cumbersome) preventing the unfettered ability to eliminate incompetent teachers and frustrating what in their minds is the most viable strategy to improve schools—firing bad teachers and pressuring the rest to improve.

As an aside, the article “Tenure: How Due Process Protects Teachers and Students” explains tenure in the context of due process rights and provides a cogent rationale for fair process protections during dismissal proceedings. Also, see Dana Goldstein’s excellent book The Teacher Wars: A History of America’s Most Embattled Profession, which provides a gut wrenching picture of historical harm and arbitrary treatment teachers received before these due process rights were secured.

Many reformers as well as their political and media supporters frame the current debate about educational direction as a clash between themselves—the only ones who are trying to improve schools—and lazy or incompetent teachers and their unions. They contend that those who attempt to block their reform efforts are just trying to protect teacher prerogatives. This is why many policymakers and pundits take a confrontational rather than a cooperative stance. But educators’ opposition to the reform platform is much broader and goes much deeper than this all-too-common specious analysis. It is not that teachers (and their representatives) do not want to improve performance or that they do not see the need for schools to get better at what they are doing. Almost every professional wants that. What teachers and most district and school administrators object to is the path reformers have laid out toward accomplishing that goal. They view reformers’ Test-and-Punish reform initiatives as ill advised and ineffective at best and detrimental at worst. And they are correct.

Teacher Quality: Putting the Issue in Perspective

Contrary to the reform movement’s superficial and overheated rhetoric, the quality of teachers, while significant, is not the only important influence on student performance. According to various research studies, it accounts for only about 10% of student achievement. Bashing and blaming teachers is not a new trick. As recounted in Goldstein’s The Teacher Wars, this destructive ploy has emerged several times in our history, driven by “moral panic.” It is unjust to single out teachers as the primary cause of underperforming students and schools and, at the same time, fail to address more influential factors.

As one example, family and social dysfunction is on the rise and has had a devastating effect on educational performance. This is particularly true among working-class families. Robert Putnam’s important new book, Our Kids: The American Dream in Crisis, reveals the alarming growth in recent decades of social pathology among white working-class families. During the same period, professional families have stayed much more stable. Socioeconomic levels continue to significantly outrank all other influences on student performance.

Of course, it is easier to blame teachers for not reversing the damage done by wage stagnation and the dramatic decrease in blue-collar jobs in this country over the past decades rather than tackle these larger problems directly. In the US, we have seen rising levels of inequality, the increase of single-parent families, a steady climb in drug use, and the dearth of supportive services. Reformers’ penchant for blaming teachers and school administrators for low school performance conveniently absolves other societal institutions and actors of their responsibility for ameliorating injurious socioeconomic trends. Stanford professor Linda Darling-Hammond, one of the most respected commentators on how best to improve schools, offers an alternative view. She has called for “reciprocal accountability,” which requires that all major stakeholders share the responsibility for school performance improvement, not just teachers who are so easily scapegoated.

Julie Rummel provides a poignant much-needed teacher’s perspective of the harsh reality encountered in many of our schools. She moved from a dysfunctional poverty-stricken school where she was labeled a “mediocre” teacher to a more upscale campus where she was regarded as great. She essentially made no changes in how she taught or connected with her students.

In a new infographic, Kevin Welner of the National Center for Education Policy reinforces the unfairness of expecting schools to reverse the deleterious effects of poverty by themselves.

Finally, A Broader, Bolder Approach to Education, an organization devoted to addressing these broader issues, just relaunched its efforts following the enactment of the federal ESSA legislation.

Isabel Sawhill and Edward Rodrigue list three measures that had a major effect on whether an individual would remain in poverty. They found that graduating from high school, being in a family with at least one full-time worker, and being at least 21 and married before having children correlated closely with economic success. They describe this as “the success sequence.” According to these researchers, only 2.4% of those Americans who follow the success sequence will live below the poverty line, while over 70% enjoy at least middle-class incomes, defined as at least 300% of that poverty measure. For those who do not meet the three criteria, 79% will live in poverty. Only one of these measures is directly school related—graduation rates. Having an adult with a full-time job depends on successful job creation efforts.

According to Isabel Sawhill, “If we want to reduce poverty, one of the simplest, fastest and cheapest things we could do would be to make sure that as few people as possible become parents before they actually want to.” Here is an example of what could be done to substantially lower teen pregnancy and thus improve educational performance. From 2009 to mid-2015 a dramatically successful program in Colorado slashed the teen pregnancy and abortion rates by nearly 50% by providing free long-term birth control devices such as IUDs for teenagers. The Susan Thompson Buffett Foundation initially funded the program, but when the grant ended in 2015, the Republican-controlled legislature killed a bill to support this successful effort. Private donations saved the program for a year.

Tests Are Not Reliable Measures of Teacher Performance

Considering in-school issues, the technical ability of current student tests to accurately identify high- and low-performing teachers is woefully inadequate. In the past few years, a compelling body of research has emerged that demonstrates the dangers of test-based teacher evaluations. Three major research institutes—American Education Research Association (AERA), the National Academy of Education, and the American Statistical Association (ASA)—have forcefully warned against employing these measures for teacher evaluations. The AERA issued standards for teacher evaluation measures, which virtually no existing instruments meet.

Currently, Value-Added Measures (VAMs) is a popular tool. It claims to assess growth by aggregating individual scores adjusted for socioeconomic measures. Like other tools in widespread use, it is not accurate enough for evaluating teachers. A seminal critique of the growing use of test scores and value-added measures was written by Linda Darling-Hammond, Audrey Amrein-Beardsley from Arizona State University, Edward Haertel from Stanford, and Jesse Rothstein from University of California, Berkeley. Their research revealed how inexact the measures were, and they present case studies of egregious misidentification—when excellent teachers have been misidentified as low performing and unfairly dismissed. In Teacher and Student Evaluation: Moving Beyond the Failure of School Reform, Alyson Lavigne and Thomas Good provide a comprehensive analysis of the history of teacher evaluations. Their analysis also found the present strategies to be defective. Further, Rick Stigins in his 2014 book, Defensible Teacher Evaluation: Student Growth through Classroom Assessment, reviews the major deficiencies in current high-stakes, test-driven teacher evaluation.

In March 2015, the respected publication Educational Researcher devoted an entire issue to critiques of the most common VAMs, plus some supporting statements with caveats. For a list of the top 15 research articles that discredit the use of test scores and VAM approaches, see Amrein-Beardsley’s VAMboozled website. The site includes a recommended reading list and lists 86 articles that have raised major technical concerns about VAM. For an exhaustive list, more information, and research articles that discredit the use of test scores and VAM approaches, see briefing paper Problems with the Use of Student Test Scores to Evaluate Teachers and the article “Studies Highlight Complexities of Using Value-Added Measures.” Both make a compelling case against test-driven teacher evaluation. Professor Edward Haertel from Stanford has written a particularly persuasive admonition against this practice, as has Leo Casey in his article published in the esteemed, peer-reviewed journal Teachers College Record titled “The Will to Quantify: The “Bottom Line” in the Market Model of Education Reform.”

Two reports published in 2016 underscore the serious limitations of VAM. The first describes how VAM use failed in the Houston Independent School District (HISD); the second, produced by REL at WestEd, discusses how deficient VAM is in predicting teacher quality.

For further reading on the limits of VAM as a measure of teacher quality, see also “A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling” by Stuart S. Yeh, Beardley’s blog post “The (Relentess) Will to Quantify,” and a critique of New York’s plan to use VAM methods to evaluate teachers. Finally, the Melbourne Graduate School of Education website has a compelling video lecture on why test score evaluation does not work.

Evaluations Based on Test Scores Misidentify Teachers

These well-respected researchers make the following points. The section “Standardized Tests Are Not the Best Measures of School or Teacher Quality” in companion article Reformers Target the Wrong Levers of Improvement made the case that such scores fail to accurately measure deeper and broader learning.

As importantly, student tests were never designed to be used for teacher evaluation and suffer from high levels of misidentification, or noise. Studies have shown that if we use current tests, a teacher who is ranked at the 50th percentile could be anywhere from the 85th percentile to the 15th. A significant number of teachers bounce from top to bottom, or vice versa from year to year. A recent report from the US Department of Education found very high misidentifications—even with three years of data per teacher. One-fourth of the teachers identified as “low performers needing remediation” were actually at the mid-range of performance, and one-fourth of teachers who were deemed “average” were actually in need of professional development and support. That level of imprecision should be unacceptable for any respected profession.

A team of renowned researchers set out to demonstrate the absurdity of using student tests to determine teacher effectiveness. They found that changes in the height of students, which is obviously independent of teacher influence, was nearly as predictive of teacher effectiveness as test scores.

Examples of Test-Based Evaluations That Fail Exemplary Teachers

As a result of districts using these suspect measures, there have been numerous cases of top-flight teachers receiving negative scores and of teachers who were identified one year as “stellar” receiving a low rating the next. In some cases this occurred because a teacher voluntarily agreed to take a more difficult class. Then the teacher suffered by comparison with the easier class (s)he taught in the previous years.

The telling case of Pascale Mauclair clearly demonstrates how dangerous it can be to use such dubious evaluation measures. She agreed to teach a harder-to-educate class and did a superb job, but instead of getting congratulated she was identified by the press as “One of the Worst Teachers in the City.” A study conducted by Darling-Hammond and her colleagues also documented tragic cases of misidentification.

Finally, some teachers are taking the issue to court claiming the current evaluation procedures are so arbitrary that they are fatally flawed. An example is Sheri Lederman’s case before the New York courts. Each year she works with students who achieve double the average student proficiency rates in the state, but since her students scored so high in previous years, she did not meet the state’s ill-conceived standard requiring growth from year to year. As a result, Lederman received a low rating. The trial court held the existing system was “arbitrary and capricious” as it applied to her, and voided the rankings based on test scores.

Consequently, when such growth measures are used, results can be extremely arbitrary at the upper ends. My favorite example is the ludicrous case of Carolyn Abbott, an exemplary teacher of gifted students in New York City. Her students made huge gains each year, invariably scoring at the highest levels. In one year, her previous gifted class scored at the 98th percentile and, based on that high performance, her predicted next year’s score became the 97th percentile. The actual score landed at the 89th percentile. Many of her gifted students saw no reason to try hard on the state test since they were doing much more advanced work. On other tests that had consequences for students, they scored in the highest ranks. Even though Abbott was doing an exemplary job, the newspapers dubbed her the “Worst 8th Grade Teacher in the City.” The real story was the complete opposite.

I have some important feedback for the news media in this country: Shame on you for rushing to publish teacher rankings when you know, or should know, that these lists are bogus and prone to error. Even the more thoughtful advocates of VAMs caution against their use for high-stakes personnel decisions.

Other Flaws in Test-Based Evaluation Systems

As previously mentioned, test scores and even evaluations by principals tend to track the socioeconomic status of the student population. So schools in low-income areas have a significantly higher number of low evaluations and fewer high evaluations—a clearly unjust situation and a surefire detriment to attracting our best teachers to these areas in need.

In addition, no one yet has solved the problem that most teachers are not math and reading teachers. Thus, they do not teach the math and reading content tested on the new PARCC and SBAC assessments, yet they are still held accountable based on those test scores. As a result, many teachers are now suing after receiving low evaluations based on the test performance of students they never taught. Another major defect in the measures being used for evaluation is that different assessment instruments yield widely dissimilar results. This is further confirmation of the inaccuracy and inadequacy of these measures.

Further, individual evaluations do not take into account school context, which has a large influence on teacher performance. The students of two similarly talented teachers will score differently if one group of students is in a school led by an effective principal with working teams, a good school climate, and active engagement of students and parents while the other is in a dysfunctional school with none of these attributes.

Finally, yearly decisions about which students get assigned to which teachers can tremendously skew a teacher’s evaluation. Nonrandom assignment of students vitiates a key requirement of valid teacher evaluation systems, subjecting teachers to potential principal favoritism and pressure from parents. Newer VAMs that are being used more and more were supposedly designed to correct this, but apparently they have not. Conversely, one of the best predictors of student achievement is whether teachers are assigned to teach classes in their areas of expertise or classes that match their skill set. Ironically, when districts use the information from test-based evaluations as a proxy for mis-assignment and then reassign teachers to subjects aligned with their preparation and experience, students enjoy a much greater boost to performance than performance improvements resulting from firings using value-added teacher accountability. Perhaps the best use of high-stakes testing is holding administrators accountable for proper assignment of teachers, instead of serving as the unsound basis for teacher evaluation.

Test-Based Evaluations Do Not Measure Good Teaching and Harm the Profession

Do evaluations based in large part on math and reading test scores actually measure “good teaching”? A 2014 report by well-regarded researchers Martin Polikoff and Andrew Porter says no. They looked at six districts nationwide and found that measures of students’ opportunity to learn (OTL) the content specified in standards and measures of instructional quality, both of which have been found to be highly predictive of student learning, showed weak or zero correlation with the VAM tests being used to evaluate teachers.

The upshot of all this research is that not only is test-based teacher evaluation unfair to the limited number of teachers who can benefit from professional support, but the arbitrary threat issued to all teachers impairs their performance and discourages them from remaining in the profession. The Test-and-Punish approach has also had a damaging effect on efforts to recruit new talent. Defective evaluation schemes have many negative consequences, including teachers avoiding hard-to-teach children and resisting collaborative team-building efforts. The demeaning rhetoric about widespread teacher incompetence is another key factor contributing to growing teacher demoralization. For more on this topic, see the companion article Reformers Allowed Their Rhetoric to Be Hijacked.

The op-ed piece “Standardized Tests Don’t Help Us Evaluate Teachers” is an in-the-trenches summary by a Los Angeles Unified attorney who helped create teacher evaluations and now finds them defective. In The Teacher Wars, Dana Goldstein offers an excellent account of the disastrous consequences of personnel decisions tied to student test scores. For an excellent summary of why test scores should not be used for consequential evaluations, see David Berliner’s piece.

Finally, a 2016 extensive report on teacher evaluation policies by Thomas Toch recommends using evaluative information primarily for program and teacher improvement. This shifts the major purpose of evaluations from rooting out the lowest performers to policies aimed at lifting the whole staff.

The business community has been moving away from ranking schemes for decades, recognizing that such evaluations are superficial, work against team building, cause lower performance, and discourage risk taking by employees. In a clear case of hypocrisy, many business leaders have no compunction about recommending such discarded measures for schools.

Fortunately, many school districts and states have been withdrawing from or minimizing the use of mandatory test-based teacher evaluations leading to dismissal proceedings. Many others are using teacher evaluations as just one of many data sets that provide useful information and feedback to teachers and faculties about where to concentrate improvement efforts. Three examples are Tulsa, the state of Michigan, and Houston. State Action to Advance Teacher Evaluation, a comprehensive report by the Southern Regional Education Board, and the California blueprint, Greatness by Design, advocate using evaluations to feedback useful information for teacher improvement.

One of the most prominent architects of teacher evaluation, Charlotte Danielson, whose rubrics are in widespread use, has castigated the present way evaluations are being conducted and used. Even New York governor Andrew Cuomo, who was a strong advocate for test-based high-stakes teacher evaluation, has backtracked, and the New York Regents have halted required state test-based teacher evaluations for four years. Many educational leaders such as New York’s Nassau County superintendents had warned against this practice. Some political leaders , such as Hillary Clinton, are also beginning to speak out about the dangers of test-based teacher evaluation. Finally, a court in New Mexico found that using VAM scores based on tests is too imprecise to be used to attach consequences to the results.

Bill Gates, one of the strongest proponents of teacher evaluation strategies, has issued warnings about their overuse:

Too many school systems are using teacher evaluations as merely a tool for personnel decisions, not helping teachers get better. . . . Many systems today are about hiring and firing, not a tool for learning.

In response, to this growing resistance to test-based teacher evaluation, the recent reauthorization of the federal Elementary and Secondary Education Act (ESEA), now named the Every Student Succeeds Act (ESSA), ignores test-based teacher evaluation.

A More Effective Approach to Teacher Evaluation

Preliminarily, advocates of high-stakes teacher evaluation have a misguided view of “teacher quality.” They think it is a static individual attribute that—after the first few years—can’t really change. A more sophisticated viewpoint sees teacher quality as dynamic, which does and should grow over time. Esther Quintero, a management expert, supports this point of view. Writing for the Albert Shanker Institute blog, Quintero explains:

In the US, a number of unstated but common assumptions about “teacher quality” suffuse the entire school improvement conversation. As researchers have noted . . . instructional effectiveness is implicitly viewed as an attribute of individuals, a quality that exists in a sort of vacuum (or independent of the context of teachers’ work), and which, as a result, teachers can carry with them, across and between schools. Effectiveness also is often perceived as fairly stable: teachers learn their craft within the first few years in the classroom and then plateau, but, at the end of the day, some teachers have what it takes and others just don’t. So, the general assumption is that a “good teacher” will be effective under any conditions, and the quality of a given school is determined by how many individual “good teachers” it has acquired.

In British Columbia, Hong Kong, Shanghai and Singapore, none of these assumptions seems to be at work. Teacher effectiveness is not something fixed that individual teachers do or don’t possess. Rather, effectiveness is both a quality and an aspiration of schools: Schools ought to be organized and resourced so that teachers continuously and collaboratively improve. In these high performance systems, the whole (school effectiveness) is greater than the sum of its parts (individual teacher effectiveness) because, as Susan Moore Johnson argues:

Whatever level of human capital schools acquire through hiring can subsequently be developed through activities such as grade-level or subject-based teams of teachers, faculty committees, professional development, coaching, evaluation, and informal interactions. As teachers join together to solve problems and learn from one another, the school’s instructional capacity becomes greater than the sum of its parts.

The Learning Policy Institute published a report by Kini and Podolsky, Does Teaching Experience Increase Teacher Effectiveness? A Review of the Research, which debunked the idea that teachers don’t continue to become more effective after the first three-year learning spurt. Obviously, well-constructed professional learning will enhance the normal growth process.

It is important to dismiss incompetent teachers if their dismissal is done fairly and is part of an overall effort that gives teachers the support and time they need to improve before they are dismissed. With the right resources and approach, many low-performing teachers become good teachers. Our most successful districts do not ignore struggling teachers. They use effective assessments that include feedback, peer participation and review, and support. They have organized schools to be learning institutions in which all staff can continuously improve. These districts are also careful when making initial hiring decisions and granting tenure. Ironically, districts that follow these more supportive evaluation strategies often end up with higher dismissal rates than those following the pure Test-and-Punish approach. See the policy brief Evaluation, Accountability, and Professional Development in an Opportunity Culture, which outlines proven, more positive approaches to teacher evaluation.

Lavigne and Good have surveyed the best research and practices in the field. In their 2015 book, Improving Teachers Through Observation and Feedback, they offer powerful suggestions for correctly conducting evaluation in the service of improved performance. Their proposals markedly differ from what most districts are currently doing. In fact, Lavigne and Good emphasize useful feedback and cooperative effort, as opposed to formal evaluations. Information from a teacher’s student tests can help that teacher improve instruction when valid methods, measures, and strategies are employed, and checked for accuracy. Again, student test data should not be used in personnel decisions but as part of a broad-scale effort to collect evidence that will help teachers and schools improve. The current emphasis on narrowly conceived test-based, high-stakes teacher evaluation is unfair and ineffective.

A major report by the Network for Public Education, Teachers Talk Back: Educators on the Impact of Teacher Evaluation, reinforces the view that test-based teacher evaluation is harmful and evaluations should instead focus on improving instruction as some states such as California have done.

Does Dismissing Incompetent Teachers Improve Student Outcomes?

This is the most crucial question for those who support Test-and-Punish. First, after a decade of intensive effort to pursue teacher evaluation schemes, the results have been negligible. Rick Hess reports on a study conducted by Matt Kraft and Allison Gilmour. According to Hess:

[The authors] look at teacher evaluation results in 19 states that have adopted new evaluation systems since 2009. Unfortunately, all that time, money, and passion haven’t delivered much. Kraft and Gilmour note that, after all is said and done, the share of teachers identified as effective in those 19 states inched down from more than 99% to a little over 97% in 2015.

Second, the fact is that fixating on just the three to five percent sliver of teachers who are not performing, even if the evaluation process were fair and accurate, affects only a small fraction of teachers with limited payoff. In a school of 20 teachers, eliminating one incompetent teacher will help one class of students but does nothing for the other 19 classes. However, making test-based evaluations and dismissals a major policy component drags 19 other teachers into the vortex of legally justified yet burdensome and what are often superficial evaluation schemes.

When compared to schoolwide initiatives aimed at improving the entire staff and unleashing their potential as a coordinated team, the effect sizes of firing a failing teacher on overall student performance are small. Contrary to recent reform rhetoric, even if three to five percent of incompetent teachers were dismissed tomorrow, student gains would be minimal. There are much more productive strategies to improve student performance.

Recently, a media frenzy erupted over a research report that claimed a huge benefit from firing the worst teachers. This report sensationalized the effect of replacing a poor teacher with an average teacher by stating that the lifetime earning benefits of a given class would increase by $266,000. Diane Ravitch has questioned the methodology used in the research report. Even if the research were valid and findings accurate, the boost in earnings is quite trivial. As the report itself states, the figure amounts to about a discounted $7,000 per student per lifetime, or less than $200 per year. Put another way, the reported effect sizes are tiny compared to the payoff from other improvement strategies. Finally, the report admits that correlations are low at 0.5, which means that large numbers of teachers are identified as lacking who aren’t, and similar numbers are identified as proficient who are actually struggling.

Measures of Effective Teaching (MET) was a major study sponsored by the Gates Foundation. It found that the measures of teacher effectiveness did predict student performance in mathematics, although, again, effect sizes were small. Significant technical issues were raised about the methodology of this study as well. Critics have asked: Were random assignments fully carried out? Did the teachers of hard-to-educate students participate in sufficient numbers to validate the results? Were all the data reported? Is the report based on the flawed assumption that test scores, principal evaluations, and student surveys predicted the same thing?

However, the most damning objection to using the MET report to support high-stakes testing for personnel decisions comes from the report itself. It cautions against such use, saying the researchers did not determine or even consider if evaluation for high-stakes personnel decisions might well negate their findings. The report makes the conjecture that teaching to the test, narrowing curriculum, gaming the system, and failing to cooperate with other teachers competing for bonuses could very well lower student performance. Finally, and critically, this report does not present any evidence that identifying who is a high- or low-quality teacher resulted in improving instruction.

As demonstrated by the extensive research cited above, there is thin to nonexistent evidence suggesting that a reform strategy focused on firing incompetent teachers produces any significant gains in student achievement. Further, policymakers’ misplaced emphasis on the few suspected lowest performers comes with a huge cost. Frequently, all teachers regardless of their demonstrated capabilities are evaluated by expensive, hugely complicated, and time-consuming procedures. These evaluations gravitate toward a checklist mentality of individual items, which trivializes teaching instead of seeing it through a more complex and accurate lens. In addition to the previously mentioned Teacher and Student Evaluation: Moving Beyond the Failure of School Reform (Lavigne and Good, 2014) and The Teacher Wars (Goldstein, 2014), a video produced by WestEd provides an excellent summary of the best research and principles of effective professional evaluation systems.

Can Evaluations by Principals Fix the Problems of Test-Based Accountability?

Relying on principals’ classroom observations cannot obviate the deficiencies of using test scores to evaluate teachers. Evaluations of teachers by principals are heavily influenced by the socioeconomic levels of their students. According to Alisha Kirby:

As the components of teacher evaluations remain under debate among policymakers, a new study suggests the results of classroom observation may hinge more on the students’ capabilities than the teacher’s.

Analysis from the American Institutes for Research and the University of Pennsylvania’s Graduate School of Education found that students’ behavior and prior academic achievement weighs heavily on teacher performance and can skew the results of an evaluation.

“When information about teacher performance does not reflect a teacher’s practice, but rather the students to whom the teacher is assigned, such systems are at risk of misidentifying and mislabeling teacher performance,” reported Rachel Garrett of the American Institutes for Research and Matthew Steinberg from the University of Pennsylvania’s Graduate School of Education.

Two papers reached the same conclusions. One paper is Leading via Teacher Evaluation: The Case of the Missing Clothes? The other one is Educational Evaluation and Policy Analysis.

Further, most principals are not adequately prepared to conduct accurate teacher evaluations. Many now find themselves spending an inordinate amount of time conducting formal classroom observations with extensive item checklists in hand. They are visiting each classroom several times a year rather than spending the time needed for schoolwide efforts that will improve curriculum and instruction. It is a case of evaluation run amok. Lavigne and Good provide a chilling example of this pathology. Under Tennessee’s byzantine and excessive teacher evaluation system, principals must visit each teacher’s classroom four to six times a year. In a school of 20 teachers, that means spending between 176 and 260 hours per year on observation, not assistance. Some research even suggests that classroom observations for purposes of evaluation actually reduce performance.

A pilot report from Chicago found small effects when principals used an evaluation strategy that included two observations of reading teachers per year. The results of the evaluations were used for teacher and school improvement, not harsh consequences. A key finding was that extensive training of principals in observation techniques and how to use the evaluations in program improvement made a large difference. Finally, many walkthroughs by principals miss the essence of good teaching and instead concentrate on trivia, according to Peter DeWitt.

A Narrow Focus on Dismissing Teachers Detracts from Effective Improvement Measures

Crucially, such a narrow policy focus on dismissing a few teachers often leads to a failure to address other vital in-school measures, which significantly influence the performance of all teachers and the achievement of students. For example, large numbers of teachers leave inner-city schools each year. Teacher churn and the resulting heavy use of substitutes are a major reason for low student performance. Excellent teachers are leaving the profession due to the stress of teaching in low-income urban schools and dreadful working conditions. This problem overshadows the damage done by a few underperforming teachers.

Several researchers have recommended policies aimed at encouraging the retention of our best teachers. The New Teacher Project (TNTP) published a report in 2013 entitled The Irreplaceables: Understanding the Real Retention Crisis in America’s Urban Schools. The report laments that most districts do not have policies to encourage the highest-performing 20% of teachers to stay, and as a result the districts suffer high attrition rates. Top teachers want collegiality, being part of effective teams, better working conditions, somebody paying attention to them, and career paths that allow them to keep teaching but take on additional responsibility helping other teachers or solving school performance problems and earn more money. Districts that solely concentrate on firing incompetent teachers miss this much larger and more productive target.

It is also important to recognize that the quality of the curriculum and instructional materials is just about as important as teacher quality. For more about the importance of curriculum and educational resources, see the companion article Provide High-Quality Instruction.

In addition, the level of school funding matters. Yes, money does make a difference. Recent reports by moderate and conservative institutions refute reformers’ often expressed claim that expenditure levels are not a key component of quality. The reports find that increased funding results in improved student performance, and conversely, cutting school budgets depresses outcomes. Similar results were found in Indiana after the state drastically cut educational support. The companion article Provide Adequate School Funding covers the role of funding in its discussion of district/state support for improving schools.

For a review of the literature that has revealed funding matters, see Does Money Matter in Education? Unfortunately, the “money doesn’t matter philosophy” and political antipathy to public education in this country have substantially hampered school funding. Most states are spending below their 2008 expenditures, and some are cutting even more.

Equally important is site and district leadership, particularly as they relate to building systems that connect teaching, curriculum, and instruction; to continuously improving these elements; and to improving the school climate by increasing the degree of engagement of teachers, students, and parents and community. A recent report by Thomas Kane from Harvard found teacher perception of the school being a good place to work improved performance. In math, the amount of professional development and teacher feedback also helped. Principal leadership accounts for about one-quarter of in-school measures of student performance, teacher quality about one-third.

For a perceptive two-part series on how to best train principals to lead and a description of efforts currently under way in four states, see the Marc Tucker’s blog posts “Organizations in Which Teachers Can Do Their Best Work,” Part 1 and Part 2. For a comprehensive report on principal training, see The School Principal as Leader: Guiding Schools to Better Teaching and Learning and the standards for school leadership approved by the National Policy Board for Educational Administration in 2015.

There are other essential components of effective improvement efforts: provision of social support and medical services, ongoing professional development and team building for all teachers, and the use of just-in-time assessment systems and valid data on each student’s progress to inform instruction.

Reform measures that emphasize terminating incompetent teachers based on questionable methods not only lower teachers’ morale and efficacy, but inevitably lead to conflict with staff who understand the underlying flaws in the strategy. The evidence is clear—conflict between key stakeholders tends to sabotage the cooperative efforts needed to achieve effective reform. As many have said, “You can’t fire your way to educational greatness.”

Targeting the Lowest-Performing Schools with Closure and Other Drastic Measures Is Usually Ineffective

When it comes to evaluating schools, high-stakes accountability based on tests has been just as ineffective and just as problematic in terms of unintended consequences. Concentrating on five percent of low-testing schools and responding to their performance with drastic measures—closures, mass firings, or conversion to charters—has produced negligible results. Such reform measures do, however, severely impact those schools, their students, and the surrounding communities. This is even more concerning given that many of the affected schools were unfairly misidentified. They were actually progressing equal to or better than the remaining schools in the district. The failure of school turnaround policies has been documented by a number of respected sources. According to the National Education Policy Center’s description of a meta-analysis by Tina Trujillo of University of California, Berkeley and Michelle Renée of Brown’s Annenberg Institute for School Reform, Trujillo and Renée stated that school turnaround policies are “more likely to cause upheaval than to help.” See also pages 96–97 of the previously cited Teacher and Student Evaluation: Moving Beyond the Failure of School Reform, and for an overall study of turnaround strategies, see Emerging State Turnaround Strategies, a report prepared by the Education Commission of the States.

States that used tests to grade schools have found major problems with accuracy, and many have reversed the policy. For a critique of Florida’s 15-year failed effort to get school grades right, see “School Scandals Reveal Problem with Grading Schools.” For a broader, balanced critique of Florida’s reform initiatives, see the Shanker Institute’s policy brief. Many have questioned whether the state reform formula and direction were actually the driving force behind the early gains. Instead, they point to the efforts made by excellent local superintendents who stressed the Build-and-Support approach. Florida’s gains have since stalled following school-funding cutbacks, massive charter expansion, and stringent accountability measures. There are reports showing that segregation and in-school deficiencies considerably outweigh school-to-school comparisons in predicting achievement gaps.

This research demonstrates that, as of yet, the knowledge base for identifying failing schools is not sufficiently developed to allow for fair assessments. As a result, many local sites are labeled as failures simply because they have large numbers of poor and/or students of color. In addition, there is no clear research-based consensus regarding the best ways to intervene in low-performing schools. For example, recent evaluations of the federal School Improvement Grants program aimed at the lowest-underperforming schools found a slight overall improvement, but one-third of the grantees actually had falling scores. The feds are currently providing a bit more flexibility to applicants under the program, admitting that their previous prescriptions were off base. Moreover, even if reform efforts were fair and successful, focusing on the few schools at the bottom ignores the vast majority of children. As Michael Fullan, one of the most respected leaders of the Build-and-Support approach, has pointed out, policies aimed at improving all schools have far better results. Edward Fiske and Helen Ladd made a similar point in an op-ed about successful low-income districts in London. The districts that flourished pursued a districtwide strategic improvement plan as opposed to targeting the lowest performers, used broad accountability systems that went beyond test scores, and provided support for low-income students.

Recent Developments

9/14/2016 Where school turnarounds have been successful they have been embedded in over-all district efforts to improve and have avoided a punishment approach. A new report by the Center for American Progress https://www.americanprogress.org/issues/education/report/2016/09/13/143922/7-tenets-for-sustainable-school-turnaround/ has found seven important issues for successful school turnarounds:

Grant districts, and ultimately the state, the authority to intervene in failing schools.

Provide significant resources to support planning and restructuring and leverage competitive grants.

Treat the district as the unit of change and hold them accountable for school improvement.

Create transparent tiers of intervention and support combined with ongoing capacity building and sharing best practices.

Promote stakeholder engagement.

Create pipeline programs for developing and supporting effective turnaround school leaders.

Embed evaluation and evidence-based building activities in school implementation

7/30/2016 Audrey Amerain-Beardsley reviewed an excellent piece from twenty years ago by Ed Haertel on the deficiencies of test-based evaluations of teachers. http://vamboozled.com/wp-content/uploads/2015/01/Haertel_1986.pdf She details six major points Haertel makes, all consistent with the article above.

7/30/2016 Another researcher debunks the value of value added measures for teacher evaluation. http://vamboozled.com/vams-are-never-accurate-reliable-and-valid/ Another district eliminated VAMS for teacher evaluation. http://vamboozled.com/no-more-evaas-for-houston-school-board-tie-vote-means-non-renewal/

BBS Companion Articles

Why Conventional School “Reforms” Have Failed
Reformers Target the Wrong Levers of Improvement
How Top Performers Build-and-Support
Provide High-Quality Instruction
Provide Adequate School Funding

Reference Notes

Hattie, J. (2008). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. London: Routledge.

Firing Teachers Based on Students’ Test Scores Is Not the Answer
Aspen Institute. (2016, Mar). Evaluation and Support Systems: A Roadmap for Improvement. http://www.aspeninstitute.org/publications/teacher-evaluation-support-systems-roadmap-improvement  See also Brown, C., Partelow L., & Konoske-Graf, A. (2016, Mar 16). Educator Evaluation: A Case Study of Massachusetts’ Approach. https://www.americanprogress.org/issues/education/report/2016/03/16/133038/educator-evaluation/  Humphrey, D., Koppich, J., & Tiffany-Morales, J. (2016, Mar). Replacing Teacher Evaluation Systems with Systems of Professional Growth: Lessons from Three California School Districts and Their Teachers’ Unions. https://www.sri.com/work/publications/replacing-teacher-evaluation-systems-systems-professional-growth-lessons-three  Taylor Kerchner, C (2016, Mar 21). Five Lessons for Creating Effective Teacher Evaluations. http://blogs.edweek.org/edweek/on_california/2016/03/five_lessons_for_creating_effective_teacher_evaluation.html

Kahlenberg, R. D. (2015, Summer). Tenure: How Due Process Protects Teachers and Students. American Educator. http://www.aft.org/ae/summer2015/kahlenberg

Goldstein, D. (2014). The Teacher Wars: A History of America’s Most Embattled Profession. New York: Doubleday.

Teacher Quality: Putting the Issue in Perspective
Haertel, E. H. (2013). Reliability and Validity of Inferences About Teachers Based on Student Test Scores. Educational Testing Service. http://www.ets.org/research/policy_research_reports/publications/publication/2013/jquq

Putnam, R. D. (2015). Our Kids: The American Dream in Crisis. New York: Simon & Schuster. For other works on the same topic, see Morsy, L., & Rothstein, R. (2015, Jun 10). Five Social Disadvantages That Depress Student Performance: Why Schools Alone Can’t Close Achievement Gaps. Economic Policy Institute. http://www.epi.org/publication/five-social-disadvantages-that-depress-student-performance-why-schools-alone-cant-close-achievement-gaps/?utm_source=Economic+Policy+Institute&utm_campaign=26b9c8a34e-EPI_News_06_12_156_12_2015&utm_medium=email&utm_term=0_e7c5826c50-26b9c8a34e-55876685 Berliner, D. C. (2013). Effects of Inequality and Poverty vs. Teachers and Schooling on America’s Youth. www.tcrecord.org/content.asp?contentid=16889 See also Summers, L. H., & Balls, E. (2015, Jan). Report on the Committee for Inclusive Prosperity. https://www.americanprogress.org/issues/economy/report/2015/01/15/104266/report-of-the-commission-on-inclusive-prosperity/

Rich, M., Cox, A., & Bloch, M. (2016, Apr 29). Money, Race, and Success: How Your School District Compares. The New York Times. http://www.nytimes.com/interactive/2016/04/29/upshot/money-race-and-success-how-your-school-district-compares.html?_r=3

Darling-Hammond, L., Wilhoit, G., & Pittenger, L. (2014, Oct 16). Accountability for College and Career Readiness: Developing a New Paradigm. Stanford Center for Opportunity Policy in Education. https://edpolicy.stanford.edu/publications/pubs/1257

Glass, Gene. V. (2016, Apr 5). Take All the Credit? You’ll Get All the Blame. http://ed2worlds.blogspot.com/2016/04/take-all-credit-youll-get-all-blame.html?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+EducationInTwoWorlds+%28Education+in+Two+Worlds%29

Strauss, V. (2016, Mar). No, Great Schools Can’t Close Achievement Gaps All by Themselves. The Washington Post. https://www.washingtonpost.com/news/answer-sheet/wp/2016/03/21/no-great-schools-cant-close-achievement-gaps-all-by-themselves/

A Broader, Bolder Approach to Education. http://www.boldapproach.org/

Sawhill, I. V., & Rodrigue, E. (2015, Nov 18). An Agenda for Reducing Poverty and Improving Opportunity. Brookings. http://www.brookings.edu/research/papers/2015/11/campaign-2016-presidential-candidates-poverty-and-opportunity

Kerwin McCrimmon, K. (2015, Aug 27). Private Money Saves Colorado IUD Program as Fight Continues for Public Funding. Kaiser Health News. http://khn.org/news/private-money-saves-colorado-iud-program-as-fight-continues-for-public-funding/ See also Tavernise, S. (2015, Jul 5). Colorado’s Effort Against Teenage Pregnancies Is a Startling Success. The New York Times. http://www.nytimes.com/2015/07/06/science/colorados-push-against-teenage-pregnancies-is-a-startling-success.html

Tests Are Not Reliable Measures of Teacher Performance
American Education Research Association and National Academy of Education. Getting Teacher Evaluation Right: A Brief for Policymakers. https://edpolicy.stanford.edu/publications/pubs/421

American Statistical Association. (2014, Apr 8). ASA Statement on Using Value-Added Models for Educational Assessment. OpEd News. http://www.opednews.com/Quicklink/ASA-Statement-on-Using-Val-in-Best_Web_OpEds-Administration_Caution_Mandates_Teacher-140412-203.html

American Education Research Association. (2015, Nov). AERA Statement of Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs. Educational Researcher. http://online.sagepub.com/search/results?submit=yes&src=hw&andorexactfulltext=and&fulltext=AERA+Statement+of+Use+of+Value-Added+Models+%28VAM%29+for+the+Evaluation+of+Educators+and+Educator+Preparation+Programs&x=0&y=0

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012, Mar 15). Evaluating Teacher Evaluation: We Know About Value-Added Models and Other Methods. Phi Delta Kappan. http://www.edweek.org/ew/articles/2012/03/01/kappan_hammond.html

Lavigne, A. L., & Good, T. L. (2013). Teacher and Student Evaluation: Moving Beyond the Failure of School Reform. New York: Routledge.

Stigins, R. J. (2014). Defensible Teacher Evaluation: Student Growth Through Classroom Assessment. Thousand Oaks, CA: Corwin Press.

Ballou, D., & Springer, M. G. (2015). Using Student Test Scores to Measure Teacher Performance: Some Problems in the Design and Implementation of Evaluation Systems. Educational Researcher. http://edr.sagepub.com/content/44/2/77.full.pdf+html?ijkey=WSTBFIHcTyO9I&keytype=ref&siteid=spedr

Amrein-Beardsley, A. (n.d.). Top 15 Research Articles About VAMS. http://vamboozled.com/research-articles-on-vams/

Amrein-Beardsley, A. (n.d.). All Recommended Articles About VAMS. http://vamboozled.com/recommended-reading/value-added-models/

Shavelson, R. J., Linn, R. L., Baker, E. L., et al. (2010, Aug 27). Problems with the Use of Student Test Scores to Evaluate Teachers. Economic Policy Institute. http://www.epi.org/publication/bp278/

Yettick, H. (2014, May 13). Studies Highlight Complexities of Using Value-Added Measures. Education Week. http://www.edweek.org/ew/articles/2014/05/13/32value-add.h33.html

Haertel, E. H. (2013). Reliability and Validity of Inferences About Teachers Based on Student Test Scores. Educational Testing Service. http://www.ets.org/research/policy_research_reports/publications/publication/2013/jquq

Casey, L. M. (2013). The Will to Quantify: The “Bottom Line” in the Market Model of Education Reform. Teachers College Record. http://www.tcrecord.org/Content.asp?ContentId=17107

Amrein-Beardsley A., Collins, C., Holloway-Libell, J., & Paufler, N. (2016, Jan 5). Everything is Bigger (and Badder) in Texas: Houston’s Teacher Value-Added System. Teacher’s College Record. http://www.tcrecord.org/Content.asp?ContentId=18983

Lash, A., Makkonen, R., Tran, L., & Huang, M. (2016, Jan). Analysis of the Stability of Teacher-Level Growth Scores from The Student Growth Percentile Model. WestEd. https://relwest.wested.org/resources/210?utm_source=REL+West+Mailing+List&utm_campaign=05c37febff-ee-4-1&utm_medium=email&utm_term=0_316bfe94f7-05c37febff-92259833

Yeh, S. (2013). A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling. Teachers College Record. http://www.tcrecord.org/Content.asp?ContentID=16934

Amrein-Beardsley, A. (2015, Apr 30). The (Relentless) Will to Quantify. http://vamboozled.com/the-relentless-will-to-quantify/

Anrig, G. (2015, Mar 25). Value Subtracted: Gov. Cuomo’s Plot to Tie Teacher Evaluations to Test Scores Won’t Help Our Public Schools. Slate. http://www.slate.com/articles/life/education/2015/03/gov_andrew_cuomo_and_teacher_evaluations_standardized_test_scores_are_the.html

Berliner, D. C. (2015, Aug 11). Teacher Evaluation and Standardized Tests: A Policy Fiasco. Melbourne Graduate School of Education. http://education.unimelb.edu.au/news_and_activities/events/upcoming_events/dean_lecture_series/dls-past-2015/teacher-evaluation-and-standardised-tests-a-policy-fiasco

Evaluations Based on Test Scores Misidentify Teachers
Schochet, P. Z., &. Chiang, H. S. (2010, Jul). Technical Methods Report: Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains. National Center for Education Evaluation and Regional Assistance. https://ies.ed.gov/ncee/pubs/20104004/

Bitler, M. P., Corcoran, S. P., Domina, T., & Penner, E. K. (2014, Spring). Teacher Effects on Student Achievement and Height: A Cautionary Tale. The Society for Research on Educational Effectiveness. https://archive.org/stream/ERIC_ED562824/ERIC_ED562824_djvu.txt

Examples of Test-Based Evaluations That Fail Exemplary Teachers
Hirsch, M. (2012, Mar 1). The True Story of Pascale Mauclair. New Politics. http://newpol.org/content/true-story-pascale-mauclair

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012, Mar 15). Evaluating Teacher Evaluation. Phi Delta Kappan. http://www.edweek.org/ew/articles/2012/03/01/kappan_hammond.html

Ravitch, D. (2015, Aug 7). Bruce Lederman Explains the Challenge to New York State Teacher Evaluation System. http://dianeravitch.net/2015/08/07/bruce-lederman-explains-the-challenge-to-new-york-state-teacher-evaluation-system/

Harris, E.A. (2016, May 10). Court Vacates Long Island Teacher’s Evaluation Tied to Student Test Scores. The New York Times. http://www.nytimes.com/2016/05/11/nyregion/court-vacates-long-island-teachers-evaluation-tied-to-student-test-scores.html

Pallas, A. (2012, May 16). Meet the “Worst” 8th Grade Math Teacher in NYC. The Washington Post. http://www.washingtonpost.com/blogs/answer-sheet/post/meet-the-worst-8th-grade-math-teacher-in-nyc/2012/05/15/gIQArmlbSU_blog.html

Other Flaws in Test-Based Evaluation Systems
Whitehurst, G. J., Chingos, M. M., & Lindquist, K. M. (Winter 2015). Getting Classroom Observations Right. EducationNext. http://educationnext.org/getting-classroom-observations-right/ See also Kirby A. (2016, Jan 21). Study Finds Flaws in Teacher Performance Observations. https://www.cabinetreport.com/human-resources/study-finds-flaws-in-teacher-performance-observations

Amrein-Beardsley, A. (2015, Oct 8). Teacher Evaluation Systems “At Issue” Across U. S. Courts. http://vamboozled.com/teacher-evaluation-systems-at-issue-across-u-s-courts/

Paufler, N. A., & Amrein-Beardsley, A. A. (2013, Jul 25). The Random Assignment of Students into Elementary Classrooms: Implications for Value-Added Analyses and Interpretations. American Educational Research Journal. http://aer.sagepub.com/content/51/2/328.

Condie, S., Lefgren, L., & Sims, D. (2014, Jun). Teacher Heterogeneity, Value-Added and Education Policy. Economics of Education Review. http://www.sciencedirect.com/science/article/pii/S0272775713001647

Test-Based Evaluations Do Not Measure Good Teaching and Harm the Profession
Polikoff, M. S., & Porter, A. C. (2014, May). Instructional Alignment as a Measure of Teaching Quality. Educational Evaluation and Policy Analysis. http://www.aera.net/Newsroom/RecentAERAResearch/InstructionalAlignmentasaMeasureofTeachingQuality/tabid/15510/Default.aspx See also Barshay, J. (2014, May 13). Researchers Give Failing Marks to National Effort to Measure Good Teaching. http://educationbythenumbers.org/content/researchers-say-pennsylvanias-measurement-teacher-effectiveness-doesnt-measure-good-teaching_1238/ and Ravitch, D. (2016, Mar 16). John Thompson: The Utter Failure of Standardized Teacher Evaluation. http://dianeravitch.net/2016/03/16/johnthompson-the-utter-failure/

Johnson, S. M. (2015, Jul 29). Four Unintended Consequences of Using Student Test Scores to Evaluate Teachers. The Washington Post. http://www.washingtonpost.com/blogs/answer-sheet/wp/2015/07/29/four-unintended-consequences-of-using-student-test-scores-to-evaluate-teachers/

Kirby, A. (2015, Aug 27). High-Stakes Teacher Evaluations May Not Help. https://www.cabinetreport.com/human-resources/high-stakes-teacher-evaluations-may-not-help See also Bryant, Jeff (2016, April) We Won’t Improve Education by Making Teachers Hate Their Jobs. http://educationopportunitynetwork.org/we-wont-improve-education-by-making-teachers-hate-their-jobs/

Kwalwasser, H. (2015, Sep 15). Standardized Tests Don’t Help Us Evaluate Teachers. Los Angeles Times. http://www.latimes.com/opinion/op-ed/la-oe-0910-kwalwasser-standardized-testing-problems-20150910-story.html

Goldstein, D. (2014). The Teacher Wars: A History of America’s Most Embattled Profession. New York: Doubleday.

Amrein-Beardsley, A. (2015. Dec 29). VAMboozled!: Why Standardized Tests Should Not Be Used to Evaluate Teachers (and Teacher Education Programs). http://nepc.colorado.edu/blog/why-standardized-tests

Toch, T. (2016, May). Grading the Graders: A Report on Teacher Evaluation Reform in Public Education. Center on the Future of American Education. https://georgetown.app.box.com/s/f47qnfh63wfxhxqu88pu5r0y0tkbo6bk

Feintzeig, R. (2015, Apr 21). The Trouble with Grading Employees. The Wall Street Journal. http://www.wsj.com/articles/the-trouble-with-grading-employees-1429624897 See also Korkki, P. (2015, Jul 11). Why Employee Ranking Can Backfire. The New York Times. http://mobile.nytimes.com/2015/07/12/business/why-employee-ranking-can-backfire.html?_r=1&referrer

Ravitch, J. (2015, Oct 21). John Thompson: The Gates Plan Failed in Tulsa, Now What? http://dianeravitch.net/2015/10/21/john-thompson-the-gates-plan-failed-in-tulsa-now-what/

Kirby, A. (2015, Nov 9). Michigan Bill Rolls Back Test Scores in Teacher Evaluations. https://cabinetreport.com/politics-education/michigan-bill-rolls-back-test-scores-in-teacher-evaluations

Amrein-Beardsley, A. (2015, Nov 9). Houston Board Candidates Respond to Their Teacher Evaluation System. http://vamboozled.com/?s=Houston+board+candidates&submit=Search&__bcf_gupi=1DCE61EDFC3F0001C87B1A304D9B1E821DCE61EDFC4000013A7E99739110F630

Gandha, T. (2016, Feb). State Actions to Advance Teacher Evaluation. Southern Regional Education Board. http://www.sreb.org/publication/state-actions-advance-teacher-evaluation

Tom Torlakson’s Task Force. (2012, Sep). Greatness by Design; Supporting Outstanding Teaching to Sustain a Golden State. http://www.cde.ca.gov/eo/in/ee.asp

Danielson, C. (2016, Apr 18). Charlotte Danielson on Rethinking Teacher Evaluation. Education Week. http://www.edweek.org/ew/articles/2016/04/20/charlotte-danielson-on-rethinking-teacher-evaluation.html?cmp=eml-enl-eu-news2-RM

Taylor, K. (2015, Nov 25). Cuomo, in Shift, Is Said to Back Reducing Test Scores’ Role in Teacher Reviews. The New York Times. http://www.nytimes.com/2015/11/26/nyregion/cuomo-in-shift-is-said-to-back-reducing-test-scores-role-in-teacher-reviews.html?ref=topics&_r=0

Disare, M. (2015, Dec 14). In Big Shift, Regents Vote to Exclude State Tests from Teacher Evals Until 2019. http://ny.chalkbeat.org/2015/12/14/breaking-in-big-shift-regents-vote-to-exclude-state-tests-from-teacher-evals-until-2019/?utm_source=Master+Mailing+List&utm_campaign=f54d1b9f78-Rise_Shine_201912_15_2015&utm_medium=email&utm_term=0_23e3b96952-f54d1b9f78-75668293#.VnA8EI-cE2y

Tyrrell, J. (2015, Nov 21). Nassau Superintendents: End Teacher Evals Tied to Test Scores. Newsday. http://www.newsday.com/long-island/nassau/nassau-superintendents-end-teacher-evals-tied-to-test-scores-1.11150791

Layton, L. (2015, Nov 16). Clinton Says “No Evidence” That Teachers Can Be Judged by Student Test Scores. The Washington Post. https://www.washingtonpost.com/local/education/clinton-says-no-evidence-that-teachers-can-be-judged-by-student-test-scores/2015/11/16/303ee068-8c98-11e5-baf4-bdf37355da0c_story.html

Ravitch, D. (2015, Dec 17). John Thompson: The Beginning of the End of VAM? http://dianeravitch.net/2015/12/17/john-thompson-the-beginning-of-the-end-of-vam/

Sawchuk, S. (2013, Apr 4). Bill Gates: Don’t Overuse Tests in Teachers’ Evaluations. http://blogs.edweek.org/edweek/teacherbeat/2013/04/bill_gates_dont_overuse_tests_in_teachers_evaluations.html See also Layton, L. (2015, Oct 7). Improving U.S. schools Tougher than Global Health, Gates Says. The Washington Post. https://www.washingtonpost.com/local/education/improving-us-schools-tougher-than-global-health-gates-says/2015/10/07/56da9972-6d05-11e5-b31c-d80d62b53e28_story.html

A More Effective Approach to Teacher Evaluation
Quintero, E (2016, Feb 23). Beyond Teacher Quality. http://www.shankerinstitute.org/blog/beyond-teacher-quality

Johnson, S. M. (2015, Jun 25). Will Value-Added Reinforce the Walls of the Egg-Crate School? http://www.shankerinstitute.org/blog/will-value-added-reinforce-walls-egg-crate-school

Kini, T., & Podolsky, A. (2016). Does Teaching Experience Increase Teacher Effectiveness? A Review of the Research. Learning Policy Institute. https://learningpolicyinstitute.org/our-work/publications-resources/does-teaching-experience-increase-teacher-effectiveness-review-research/

Public Impact. (2015). Evaluation, Accountability, and Professional Development in an Opportunity Culture. Opportunity Culture. http://opportunityculture.org/evaluation-policy-brief/

Lavigne, A. L., & Good, T. L. (2015). Improving Teachers Through Observation and Feedback: Beyond State and Federal Mandates. New York: Routledge.

Network for Public Education. (2016). Teachers Talk Back: Educators on the Impact of Teacher Evaluation. http://networkforpubliceducation.org/2016/04/6468/

Does Dismissing Incompetent Teachers Improve Student Outcomes?
Kraft, M.A., & Gilmour, A.F. (2016, Feb). Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness. Brown University. http://scholar.harvard.edu/mkraft/publications/revisiting-widget-effect-teacher-evaluation-reforms-and-distribution-teacher

Hess, R. (2016, Mar 8). When Fancy New Teacher-Evaluation Systems Don’t Make a Difference. http://mobile.edweek.org/c.jsp?cid=25920011&item=http%3A%2F%2Fapi.edweek.org%2Fv1%2Fblog%2F76%2F%3Fuuid%3D57146

Lowrey, A. (2012, Jan 6). Big Study Links Good Teachers to Lasting Gain. The New York Times. http://www.nytimes.com/2012/01/06/education/big-study-links-good-teachers-to-lasting-gain.html?_r=0

Ravitch, D. (2014, Aug 11). The Holes in the Chetty et al VAM Study as Seen by the American Statistical Association. http://dianeravitch.net/2014/08/11/the-holes-in-the-chetty-et-al-vam-study-as-seen-by-the-american-statistical-association/

Rothstein, J., & Mathis, W. J. (2013, Jan 31). Review of Two Culminating Reports from the MET Project. National Education Policy Center. http://nepc.colorado.edu/thinktank/review-MET-final-2013

DeWitt, P. (2015, May 11). 3 Reasons Why Your Observations May Be a Waste of Time. http://blogs.edweek.org/edweek/finding_common_ground/2015/05/3_reasons_why_your_observations_may_be_a_waste_of_time.html

WestEd. (2015, Sep). Video: Making Meaningful Use of Teacher Effectiveness Data. https://relwest.wested.org/resources/198

Can Evaluations by Principals Fix the Problems of Test-Based Accountability
Kirby, A. (2016, Jan 21). Study Finds Flaws in Teacher Performance Observations. https://www.cabinetreport.com/human-resources/study-finds-flaws-in-teacher-performance-observations

Hallinger, P., Heck, R. H., & Murphy, J. (2013, Jul 30). Leading via Teacher Evaluation: The Case of the Missing Clothes? Educational Researcher. http://ecs.force.com/studies/rstudypg?id=a0r70000003ql6SAAQ

American Educational Research Association. (2016, Mar). Educational Evaluation and Policy Analysis. http://eepa.aera.net See also Di Carlo. M. (2015, Feb 25). Student Sorting and Teacher Classroom Observations. http://www.shankerinstitute.org/blog/student-sorting-and-teacher-classroom-observations and Garret, R., & Steinberg, M. P. (2015, May 21). Examining Teacher Effectiveness Using Classroom Observation Scores. http://epa.sagepub.com/content/early/2014/06/13/0162373714537551

Lavigne, A. L., & Good, T. L. (2015). Improving Teaching Through Observation and Feedback: Beyond State and Federal Mandates. New York: Routledge.

Devaney, L. (2016, Jan 19). Classroom Observations May Hurt Teachers More Than They Help, Study Says. eSchool News. http://www.eschoolnews.com/2016/01/19/classroom-observations-may-hurt-teachers-more-than-they-help-study-says/

DiCarlo, M. (2015, Dec 4). Evidence from a Teacher Evaluation Pilot Program in Chicago. http://www.shankerinstitute.org/blog/evidence-teacher-evaluation-pilot-program-chicago

DeWitt, P. (2016, Apr 19). The Myth of Walkthroughs: 8 Unobserved Practices in Classrooms. http://blogs.edweek.org/edweek/finding_common_ground/2016/04/the_myth_of_walkthroughs_8_unobserved_practices_in_classrooms.html

A Narrow Focus on Dismissing Teachers Detracts from Effective Improvement Measures
American Education Research Association and National Academy of Education. Getting Teacher Evaluation Right: A Brief for Policymakers. https://edpolicy.stanford.edu/publications/pubs/421

Thompson, J. (2015, Sep 10). The Rhino in the Room: Time to End Disruptive Reform. http://www.livingindialogue.com/the-rhino-in-the-room-time-to-end-disruptive-reform/

The New Teacher Project. (2013, Jul 30). The Irreplaceables: Understanding the Real Retention Crisis in America’s Urban Schools. http://tntp.org/publications/view/retention-and-school-culture/the-irreplaceables-understanding-the-real-retention-crisis

Knudson, J. (2013, Sep). You’ll Never Be Better Than Your Teachers: The Garden Grove Approach to Human Capital Development. http://eric.ed.gov/?q=source%3a%22California+Collaborative+on+District+Reform%22&id=ED557950 See also Tucker, M. (2016, Apr). How to Get a First-Rate Teacher in Front of Every Student. http://blogs.edweek.org/edweek/top_performers/2016/04/how_to_get_a_first-rate_teacher_in_front_of_every_student.html?utm_source=feedblitz&utm_medium=FeedBlitzRss&utm_campaign=top_performers

Sawhill, I. V. (2015, Sep 8). Does Money Matter? http://www.brookings.edu/research/opinions/2015/09/08-does-money-matter-education-sawhill See also Jackson, C. K., Johnson, R. C., & Persico, C. (2015, Fall). Boosting Educational Attainment and Adult Earnings. http://educationnext.org/boosting-education-attainment-adult-earnings-schoolspending/

Ravitch, D. (2015, Oct 20). Indiana: Less Money, More Chaos. http://dianeravitch.net/2015/10/20/indiana-less-money-more-chaos/

Baker, B. (2012). Revisiting That Age Old Question: Does Money Matter in Education? http://eric.ed.gov/?q=Does+Money+Matter+in+Education&id=ED528632 See also Baker, B. (2016). Does Money Matter in Education? Second Edition. http://www.shankerinstitute.org/resource/does-money-matter and Spielberg, B. (2015, Oct 20). The Truth About School Funding. http://34justice.com/2015/10/20/the-truth-about-school-funding/

Leachman, M., Albares, N., Masterson, K., & Wallace, M. (2016, Jan 25). Most States Have Cut School Funding, and Some Continue Cutting. http://www.cbpp.org/research/state-budget-and-tax/most-states-have-cut-school-funding-and-some-continue-cutting

Kane, T. J., Owens, A. M., Marinell, W. H., Thal, D. R. C., & Staiger, D. O. (2016, Feb). Teaching Higher: Educators’ Perspectives on Common Core Implementation. http://cepr.harvard.edu/teaching-higher See also Hull, S. J. (2015, Oct 14). Principals Matter—And They Need the Right Start. http://www.learningfirst.org/principals-matter-and-they-need-right-start?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+LFA+%28Public+School+Insights%3A+What+is+WORKING+in+our+Public+Schools%29 and Center for Education Policy Research. (2014–16). Teaching Higher: Educator’s Perspectives on Common Core Implementation. http://cepr.harvard.edu/teaching-higher

Tucker, M. (2015, Aug 13). Organizations in Which Teachers Can Do Their Best Work: Part I. http://blogs.edweek.org/edweek/top_performers/2015/08/organizations_in_which_teachers_can_do_their_best_work_part_i.html

Tucker, M. (2015, Aug 20). Organizations in Which Teachers Can Do Their Best Work: Part II. http://blogs.edweek.org/edweek/top_performers/2015/08/organizations_in_which_teachers_can_do_their_best_work_part_ii.html

The Wallace Foundation. (2013, Jan). The School Principal as Leader: Guiding Schools to Better Teaching and Learning. http://www.wallacefoundation.org/knowledge-center/Pages/The-School-Principal-as-Leader-Guiding-Schools-to-Better-Teaching-and-Learning.aspx

Superville, D. R. (2015, Oct 23). New Professional Standards for School Leaders Are Approved. http://blogs.edweek.org/edweek/District_Dossier/2015/10/new_professional_standards_for.html?r=608789257

Targeting the Lowest-Performing Schools with Closure and Other Drastic Measures Is Usually Ineffective
Miller, T. D., & Brown, C. (2015, Mar 31). Dramatic Action, Dramatic Improvement. Center for American Progress. http://nepc.colorado.edu/thinktank/review-school-turnaround See also Burris, C. (2015, Sep 4). School Closures: A National Look at a Failed Strategy. http://www.networkforpubliceducation.org/2015/09/school-closures-a-national-look-at-a-failed-strategy-2/?can_id=012f354d90b87664b362dda6a4b2980d&source=email-school-closures-a-national-look-at-a-failed-strategy&email_referrer=school-closures-a-national-look-at-a-failed-strategy and American Institutes for Research and Mathematica Policy Research. (May 2015). Evaluation Brief: State Capacity to Support School Turnaround. National Center for Education Evaluation and Regional Assistance. https://ies.ed.gov/ncee/pubs/20154012/

Trujillo, T., & Renée, M. (2012, Oct 1). Democratic School Turnarounds: Pursuing Equity and Learning from Evidence. National Education Policy Center. http://nepc.colorado.edu/publication/democratic-school-turnarounds

Aragon, S., & Workman, E. (2015, Oct). Emerging State Turnaround Strategies. Education Commission of the States. http://www.ecs.org/press-release-emerging-state-turnaround-strategies/ See also Felton, E. (2015, Oct 19). Are Turnaround Districts the Answer for America’s Worst Schools? http://hechingerreport.org/are-turnaround-districts-the-answer-for-americas-worst-schools/

Ehrenhalt, A. (2013, Oct). School Scandals Reveal the Problem with Grading Schools. Governing. http://www.governing.com/columns/col-school-scandals-reveal-testing-ignorance.html

Di Carlo, M. (2015, Jun). The Evidence on the “Florida Formula” for Education Reform. Albert Shanker Institute. http://www.shankerinstitute.org/resource/evidence-florida-formula-education-reform

Sparks, S. D. (2015, Oct 6). Studies Probe How Schools Widen Achievement Gaps. Education Week. http://www.edweek.org/ew/articles/2015/10/07/schools-help-widen-academic-gaps-studies-find.html?r=258221469&cmp=eml-enl-eu-news1-RM

Klein, A. (2014, Sep 15). New Turnaround Options Detailed in Draft SIG Guidance. Education Week. http://www.edweek.org/ew/articles/2014/09/17/04sig.h34.html

Fullan, M. (2011, Nov 17). Choosing the Wrong Drivers for Whole System Reform. http://education.qld.gov.au/projects/educationviews/news-views/2011/nov/talking-point-fullan-101117.html

Fiske, E. B., & Ladd, H. F. (2016, Feb 13). Learning from London About School Improvement. The News & Observer. http://www.newsobserver.com/opinion/op-ed/article60118256.html

The Big Picture: Have High-Stakes Testing and Privatization Been Effective?

Print Friendly, PDF & Email

The Big Picture
Have High-Stakes Testing and Privatization Been Effective?

by Bill Honig

More and more educators, parents, and community, political, and opinion leaders are becoming aware of the failure of high-stakes accountability based on reading and math test scores (Test-and-Punish) and the failure of privatization hailed as “choice, charters, and competition.” As a result, people are increasingly open to alternative strategies. A viable replacement is staring us right in the face. It is found in our most successful public and charter schools, districts, and states that adopted the more positive, engaging Build-and-Support agenda. This article examines the problem of low student performance and the flawed approach used by conventional reformers who support Test-and-Punish and market-driven solutions. It will summarize the evidence that documents the reform policies’ lack of success and describe the considerable collateral damage these policies have caused.

The Problem of Low Performance: Real or Hype?

The conventional school reform movement began as a response to the perceived low performance of our students. While the reformers’ solutions have been unsound, the problem is very real. Although student performance is currently at its highest level in our history, there is widespread agreement in this country that the increasing educational demands of the job market, the impact of global competition, and the need to preserve our democracy require a substantial improvement in student achievement in our schools and colleges and the narrowing of the performance gap between affluent and low-income, minority, or second-language children. Indisputably, there are excellent classrooms, schools, and districts across the United States. Moreover, there are hundreds of thousands of dedicated teachers, including those teaching in difficult circumstances, who day by day do a superb job with their students. As a result of their efforts, graduation rates and student performance have risen substantially in the past 20 years, although student performance has stalled recently as the harsh policies of the reform movement took hold. At the same time, no one disputes the fact that far too many dysfunctional classrooms, schools, and districts must be improved if students in those settings are going to have any chance at leading a productive life.

In order for our country to stay competitive, virtually every school and district in the US must continually focus on improvement. Some states and districts shine. Massachusetts, for example, outperforms just about every other nation in the world, and Long Beach Unified School District is one of the 20 best districts on the planet. Yet most other states and districts are lagging.

Distressing International Results

Currently, our youngsters significantly underperform students in other industrialized countries—seriously jeopardizing our democratic and economic future. Nor is it just our lower achievers who are lagging. According to one recent international assessment, the Programme for International Assessment of Adult Competencies (PIAAC), recent US college graduates, students with some college, high school graduates, and high school dropouts are average compared to their global counterparts in terms of the practical applications of literacy, but they are near the bottom in numeracy and at the bottom in technical problem solving.

Similarly troubling are our low scores and declining growth on the 2012 Program for International Student Assessment (PISA), given to 15-year-olds worldwide, especially as these scores relate to students’ math skills. Comparable results were found in the 2011 Trends in Mathematics and Science Study (TIMSS), which tested eighth graders in math and science. In its 2015 review of international assessments, the Organization for Economic Cooperation and Development (OECD) found the US ranking 31st among 76 countries in basic math and science skills, with 23% of our students failing to reach rudimentary levels. The report foresaw a large economic payoff if we are able to improve these results. A recent summary by OECD found no improvement from 2003 to 2012 in the numbers of US students scoring as low performers in math and reading.

Do International Tests Fairly Reflect Socioeconomic Factors?

Analyzing international test results is complex. The failure to accurately account for higher levels of poverty in the US and lower family academic resources (FAR)—such as a mother’s educational level and books in the home—exaggerates our performance gap. In their October 2015 report, Bringing It Back Home, Carnoy, García, and Khavenson adjusted for FAR, which significantly narrowed the gap between the US and other nations, particularly at the lower socioeconomic levels. When our lowest FAR cohorts of students were compared to similar students, the gap closed substantially in both math and reading. However, such adjustments still left our students, on average, performing significantly below other comparable nations.

Overall, our students fared much better in reading than in math—scoring in the middle of other countries. In math, we considerably trailed many postindustrial countries, including France, Germany, and the United Kingdom. Unfortunately, compared to similar FAR cohorts, our middle-range students fell further behind other nations, and our more advantaged levels plunged. Even so, adjusted US TIMSS math scores grew from 1995 to 2011 by a hefty 0.5 standard deviation (SD), or one-half to one year’s added instruction. This growth rate, however, is not sufficient to catch up to many other countries, as they experienced greater increases at the middle and higher socioeconomic levels.

It is important to note that the PISA and TIMSS scores vary widely among US states. Some of our states, when adjustments for FAR are made, surpass results in the highest-performing nations. For example, on the PISA 2012 test, Massachusetts nearly matches Canada and Finland, two of the top-scoring countries in mathematics; matches Germany; and actually surpasses France and the United Kingdom. In reading, the Bay State outscores all nations but one—Korea. For a nuanced view of US national and international rankings on school performance, see the report by the Horace Mann League and the National Superintendents Roundtable.

Are International Tests Useful Measures of Achievement?

Many experts in the field of education question the value of international tests on the grounds that the tests do not measure important aspects of education such as depth of knowledge, interpersonal skills, drive, character, perseverance, ambition, creative thinking, and willingness to challenge accepted orthodoxy. Others contend that the tests are methodologically flawed, although that view has been widely challenged. Some analysts also point to the fact that the United States has traditionally scored in the middle of the industrialized countries yet has consistently outperformed those countries in actual economic growth and scientific innovation.

It is interesting to note that some educational leaders in high-scoring countries are warning us not to place too much emphasis on high test scores. In a recent statement, the deputy minister of education and training in Vietnam, which now places 12th in the world in the cited OECD report, cautions that many Vietnamese students have learned by rote, are unable to solve unique problems, do not have the interpersonal skills needed for work, and subsequently perform poorly in college and careers. For additional comments from educators in that country, see M. I. Hanoi’s article on The Economist website. For a critique of China’s test-driven system, see Diane Ravitch’s review of Yong Zhao’s book Who’s Afraid of the Big Bad Dragon? Why China Has the Best (and Worst) Education System in the World.

It is worth noting that the previously mentioned PIAAC test, which shows US students underperforming, is primarily a problem-solving and application test and is thus arguably more predictive of adult performance. Also worrisome is the slowing growth in performance for our students in the middle and higher socioeconomic groups. Most troublesome is the large number of students failing to reach rudimentary levels. Given that jobs in the future will increasingly demand higher educational levels, it is essential that all students at least reach basic levels, which the international assessments do measure. Therefore, the results of these international assessments do matter, and they matter more now than ever.

Misguided Reform Policies

Consequently, today our country and its educators are faced with a major policy question: Which strategies have the best chance of rectifying our relatively low performance? Two different approaches are vying for acceptance—Test-and-Punish and Build-and-Support.

I can sympathize with the passion that drives reformers’ desire to crack down on low-performing schools and incompetent educators. There are certainly many distressing examples of malfunctioning or mediocre schools and classrooms. We should do everything in our power to address these problems. It is also true that there are individual teachers and entire school staffs who have given up striving for excellence and are merely marking time until retirement. Most are reacting to overwhelming problems: traumatized and alienated students, indifferent parents, a hostile political climate, inept leadership, and extremely high levels of stress. Many of these disaffected practitioners have become angry at their school conditions and constant public vilification. As a result, they resist improvement measures and urge their union representatives to be uncooperative and unyielding.

Unfortunately, many reformers have responded with a counterproductive solution—upping the ante by exerting more pressure on these disheartened, exhausted, or underperforming educators. There are much more effective ways to improve teacher and school performance, as exemplified by numerous schools that have managed to rekindle the professional energies of a demoralized staff and correct genuinely dreadful situations. These successful programs use a Build-and-Support approach that focuses on instruction, building trust, and creating effective teams.

I can also understand how anti-reformers fuel frustration when they downplay the idea that some schools are underperforming or the idea that many teachers and schools require substantial improvements. For an incisive rebuttal to those who assert that “schools are doing just fine,” see Grant Wiggins’s letter on the subject and Jai Mehta’s article. Reformers’ anger and frustration are understandable. But anger and frustration do not justify ill-advised approaches, especially when effective alternatives exist. To make matters worse, many reform measures have done little good and much harm.

Conventional reformers tend to base their improvement initiatives on a misguided belief in high-stakes testing and market-driven competition. For more than a decade, this two-pronged approach has produced only limited results. Yet these same reform measures have caused considerable collateral damage to schools and resulted in a disastrous drop in teacher morale and the appeal of teaching as a profession.

Reformers assume that schools will not improve by themselves and, therefore, will require external pressure in the form of high-stakes accountability based on standardized reading and mathematics test scores. Reform advocates assert that the best way to improve student performance is to fire the lowest-performing three to five percent of teachers; reward the superstars; encourage competition and disruption by expanding charter schools and choice; and close neighborhood schools with the lowest scores, or replace their staffs, or convert the schools into charter schools. In fact, many reformers promote wholesale privatization of public education by replacing public schools with charters or with private schools funded by vouchers. For a decade since the passage of No Child Left Behind (NCLB), these proposals have been put into practice on every level—nationally, in most states, and in many districts.

Until recently, the federal government and a multitude of states and school districts have heavily promulgated this reigning get-tough-on-teachers-and-schools dogma and the belief in the power of market-based competition, choice, and charters. In December 2015, Congress repealed NCLB and the Race to the Top expansion sponsored by the Obama administration. The new Every Student Succeeds Act (ESSA) ameliorated some of the more extreme measures of the reform movement sponsored nationally and is a welcome course correction. ESSA shifts much decision making to the states and local levels, so that is where the debate on which way to improve our schools will now primarily occur. Although there is a growing shift away from the “reform” agenda, discredited proposals continue to be supported by far too many political and opinion leaders, wealthy individuals, editorial boards, think tanks, and well-funded organizations. This support persists in spite of the evidence from the most successful districts and states such as Massachusetts and now California, which have adopted an instructionally driven, supportive approach that is grounded in modern management techniques of engagement. For more about exemplary districts and states, see Exemplary Models of Build-and-Support.

Since a mainstay of reform policy is to hold schools accountable for improving test results, it is only fair to judge the reform movement by how well it improved student performance on tests—live by the scores, die by the scores. Admittedly, a once-a-year standardized test only offers a limited measure of student learning, but reformers have had no compunction about using those test results to fire teachers, close schools, and privatize entire districts. Thus, in fairness, they cannot reasonably object to using the same criteria to evaluate their reforms.

Meager National Results

Much to the reformers’ chagrin, their strategies have produced only meager results, though this lack of success has not tempered their advocacy. In the 1990s, the overall average scores of the National Assessment of Educational Progress (NAEP), our well-respected national score card, revealed a slow but steady rise in student performance. That was before the enactment of the national No Child Left Behind (NCLB) legislation in 2001, which established the primacy of high-stakes accountability.

After the passage of NCLB, the growth of NAEP scores slowed. During the past few years the adoption of punitive “reform” measures has intensified, fully supported and required by the Obama administration. Since 2009, as test-based teacher evaluations have spread and harsh consequences for failure to meet unattainable goals have been triggered, gains in NAEP scores have essentially halted. In contrast, our most successful districts and the highest-performing nations have continued to improve by adopting a more supportive strategy.

NAEP relies on student samples unconnected to individual teachers or particular schools. Thus, the test cannot be linked to accountability systems and carries no consequences for low performance. Consequently, NAEP is one of the most accurate tests of student achievement, albeit a limited measure. The test avoids artificially inflated results that are generally associated with high-stakes testing. In those cases, results are skewed by damaging behaviors such as spending excessive time on test preparation and outright gaming of the system. The NAEP processes of sampling and lack of consequences also minimize curriculum narrowing for test prep purposes and its deleterious effects on deeper learning and broader instruction. Here are the results from the most recent period. Nationwide, 12th-grade 2013 NAEP reading and mathematics scores were unchanged from 2009. Since 2009, fourth-grade scores were also flat for mathematics and increased only two points in reading; eighth-grade scores increased only one point in reading and declined one point in math.

Equally concerning is the fact that our students are performing significantly below students in industrial countries and are continuing their slide. In 2012, results from Program for International Student Assessment (PISA) show declines from the already low 2007 levels: a six-point decrease in math, four points in reading, and five points in science. See also the Welner and Mathis policy memo for a recent summary of the lack of improvement in student achievement during the proliferation of the more severe reform measures. Similar disappointing results were documented worldwide for countries that pursued test-driven high-stakes accountability systems and competition strategies.

Finally, the gap between high-income and low-income students has substantially increased in the past 25 years due to rising income inequality and, according to one scholar, has widened 30–40%. Gary Sasso writes:

As the income disparity has increased, so has the educational achievement gap. According to Sean F. Reardon, professor of education and sociology at Stanford University, the gap for children from high- and low-income families is at an all-time high—roughly 30 to 40 percent larger among children born in 2001 than among those born 25 years earlier.

High school graduation rates are another measure used to gauge school effectiveness. From 2011 to 2014, these have inched up from 79% to 82%, although they are still falling further behind our competitors. This rise was most likely caused by a combination of efforts initiated by schools, credit recovery strategies for students not qualifying for graduation (some of which are questionable), changing attitudes of students stemming from the increasingly dismal outlook for high school nongraduates, and a more realistic assessment of the importance of educational attainment by low- income, minority, and immigrant families. California, which did not pursue a Test-and-Punish strategy, actually rose at a rate that was higher than the national increase. Furthermore, our country’s college graduation rates are also slipping behind those of many industrial nations.

Another disconcerting finding is that in many urban districts the gap is increasing between low-income, English-language learners, and minority students and other students. This is particularly the case in districts pursuing large-scale expansion of charter schools and “reform strategies.” Similarly, Scholastic Aptitude Test (SAT) scores, one of the two major college entrance exams, have tumbled in the past five years—dropping seven points in 2015 alone. ACT scores, the other major college entrance exam, were flat. The drop in SAT scores cannot be explained by changes in the composition of the test takers or the increasing numbers of students who are taking the test.

To be fair, primarily in the early millennial years, there were some positive changes in instruction due to increased pressure from accountability efforts and the availability of test results for neglected subgroups. These changes translated to increases in fourth- and eighth-grade mathematics scores. Also, contrary to conventional opinion, international tests showed our lowest-performing students catching up to but still significantly behind their FAR cohorts in the top-performing countries, while our top students had stalled. In the mid-2000s, we also saw a recovery from a severe dip in the number of students qualified for college, returning to a 40% level in math and reading—about where we were in 1998.

These increases in NAEP scores, however, were more sporadic than in the decade before high-stakes test-driven accountability became widespread. After NCLB, there were no NAEP score increases at 12th grade and no increases for reading, and overall they seem to have ceased during the recent era of more stringent reforms. Some growth was masked by the Simpson paradox where overall scores can be flat while each subgroup is improving due to a change in the mix of students—more lower-scoring minority or second-language students. Even taking this paradox into account, most of the growth statistics for each subgroup were minimal after 2009, except for the growth demonstrated by Hispanic students in reading.

In stark contrast to the disappointing national scores, during the same period many districts, states, and countries had significant gains in reaching higher average scores or increases in proficiency levels on NAEP. That is because they followed a broader, more supportive approach. For an in-depth discussion of this Build-and-Support approach in action, see the series of companion articles How Top Performers Build-and-Support.

Collateral Damage Caused by Reform

Whatever limited growth resulted from tough accountability measures, it has been overshadowed by the deleterious effects high-stakes test accountability has had on instruction, teacher efficacy, and morale. In addition to lackluster test scores, reform initiatives have led to a severe narrowing of the curriculum due to their focus on high-stakes math and reading tests. Superficial teaching to the test, at the expense of deeper learning, has proliferated. For a scholarly treatment of the concept of deeper learning, see the work of Jal Mehta and Sarah Fine Maggie Lampert, and Mike Amarillas’s blog post.

History, science, humanities, art, and other crucial subjects have been decimated. The Council of the Great City Schools report (2015) found that increases in testing time did not improve instruction but did cause significant collateral damage. For more on this topic, see the FairTest report and the excellent book The Test: Why Our Schools Are Obsessed with Standardized Testing—But You Don’t Have to Be, written by Anya Kamenetz.

Perverse accountability incentives have encouraged teachers and administrators to game the system by devoting inordinate time to test preparation, concentrating only on students near cutoff points, and, in some tragic cases, outright cheating. In many states, reformers have promoted unfair, unproven reward-and-punishment tools, which have discouraged collaboration among teachers, thwarted the building of effective teams, and caused a severe drop in morale. Finally, reform nostrums have diverted attention from, de-emphasized, or belittled Build-and-Support policies that can actually produce substantial results.

Have Individual Components of Reform Worked?

Not only has the reform movement failed to produce results overall, but reputable evaluations of individual reform measures such as turnaround schools, charter schools, merit pay, or test-based school and teacher accountability have either found nonexistent or trivial effects. See the series of companion articles Why Conventional School “Reforms” Have Failed for a detailed discussion of the reasons these measures failed to produce results.

Even when small gains are detected, the gains are substantially below the improvements brought about by the initiatives at the heart of Build-and-Support. To put these findings in perspective, a full standard deviation (1.0 SD) difference in test performance translates to between one and two years of additional instruction. Analyses of reform efforts with increases reveal inconsequential effect sizes of 0.05 to 0.15 SD, which is substantially below programs that actually work. These meager results did not dissuade the reform community from trumpeting the reported increases as major breakthroughs.

In his meta-analysis of 150,000 research studies involving 250 million students, John Hattie lists the effect size of 150 of the most popular school improvement interventions. He found several programs near or above the 1.0 SD level, though it is important to note that 0.4 of that level was expected yearly growth. Among the effective practices were visible learning—making children’s thinking and understanding transparent and enlisting students in the educational process, 1.44 SD; formative evaluation—getting timely information on how well a student is progressing, 0.9 SD; response to instruction—early intervention after good first teaching, 1.07SD; and classroom discussion, 0.82 SD.

Many other measures were close to the 1.0 SD range, which is several times the minimal effect size of 0.04 to 0.05 SD found for urban charter schools compared to their public school counterparts according to a Center for Research on Education Outcomes (CREDO) study. Perhaps most importantly, Hattie found that the largest gains were produced by improvement efforts that focused on developing collaboration, team building, and continuous improvement capacity. He calls this “The Power of Collaborative Expertise.”

Many of the high-scoring programs and ideas are integral to the Build-and-Support strategy and staples of the active classroom instructional approach called for in the Common Core State Standards. These measures offer a clear rebuttal to the claims that the only way to improve public education is through governance reforms such as charters and the competitive pressure they engender or high-stakes accountability based on tests. Of the 150 improvement strategies evaluated, charter schools were 114 on the list, in the bottom range of Hattie’s effect size with almost no advantage over expected normal growth.

Alyson Lavigne and Thomas Good conducted an extensive review of the efficacy of reform measures such as turnaround schools and merit pay. In their 2014 book, Teacher and Student Evaluation: Moving Beyond the Failure of School Reform, they report finding either insignificant gains or no effect at all. Likewise, Grover Whitehurst discovered small increases of between 0.05 and 0.15 SD gains for some reform strategies and no gains for many others. He compared these small improvements to the much larger boosts achieved by programs such as dropout prevention (1.0 SD) and excellent early reading phonics programs (0.8 SD). He also points to the What Works Clearinghouse, which lists a raft of programs with effect sizes many multiples of those found for charter schools, turnaround schools, or merit pay.

This article has supported the contention that while we have much to do to improve our schools, the “reform agenda” was not the right medicine and has not produced results. The series of companion articles Why Conventional School “Reforms” Have Failed explains why this agenda has been unsuccessful.

A Tale of Two Cities

Two New Jersey school districts provide powerful examples of the difference between Test-and-Punish and Build-and-Support. Union City, New Jersey, undertook extremely effective but low-key school improvement measures. The success of its Build-and-Support approach is chronicled in David L. Kirp’s recent book, Improbable Scholars: The Rebirth of a Great American School System and a Strategy for America’s Schools. Just seven miles away, the Newark, New Jersey, district implemented a “reform strategy” that was highly disruptive to schools and communities and had minimal positive outcomes for students. After five years of a very public and controversial school improvement effort, Newark’s experiment was unsuccessful. For a complete account of what went wrong, see The Prize: Who Is in Charge of America’s Schools? by Dale Russakoff, and for an illuminating contrast of the two approaches see an article by David L. Kirp, “How to Fix the Country’s Failing Schools: And How Not To.”

School leaders in Union City, New Jersey, followed an incremental, basic approach concentrating on long-term improvement of instruction through strong content, team and trust building, collaboration, and continual reevaluation. Student achievement rose substantially as did teacher and community engagement. From being on the brink of a state takeover in 1989 due to low performance, by 2014, 89% of Union City students were graduating from high school in four years. Across the grades, test scores have nearly caught up to those of suburban New Jersey students, who are among the top performers in the US. A recent report by Stanford researchers Reardon, Kalogrides, and Shores found a strong correlation between socio-economics and student performance as well as connections between levels of segregation and opportunity gaps. A few districts substantially beat the odds. Union City was one.

An abstract of Kirp’s Improbable Scholars provides a cogent summary of the lessons learned from Union City:

No school district can be all charismatic leaders and super-teachers. It can’t start from scratch, and it can’t fire all its teachers and principals when students do poorly. Great charter schools can only serve a tiny minority of students. Whether we like it or not, most of our youngsters will continue to be educated in mainstream public schools.

The good news, as David L. Kirp reveals in Improbable Scholars, is that there’s a sensible way to rebuild public education and close the achievement gap for all students. Indeed, this is precisely what’s happening in a most unlikely place: Union City, New Jersey, a poor, crowded Latino community just across the Hudson from Manhattan. The school district–once one of the worst in the state–has ignored trendy reforms in favor of proven game-changers like quality early education, a word-soaked curriculum, and hands-on help for teachers. When beneficial new strategies have emerged, like using sophisticated data crunching to generate pinpoint assessments to help individual students, they have been folded into the mix.

The results demand that we take notice–from third grade through high school, Union City scores on the high-stakes state tests approximate the statewide average. In other words, these inner city kids are achieving just as much as their suburban cousins in reading, writing, and math. What’s even more impressive, nearly ninety percent of high school students are earning their diplomas and sixty percent of them are going to college. Top students are winning national science awards and full rides at Ivy League universities. These schools are not just good places for poor kids. They are good places for kids, period.

The experience in Newark is in stark contrast to the success in Union City. Current US senator Cory Booker, then the Democratic mayor of Newark, joined forces with New Jersey’s Republican governor Chris Christie and persuaded Mark Zuckerberg to donate $100 million. Another $100 million of matching contributions were made. The reformers’ goal was to make Newark a national model of high-stakes accountability and the market-driven reform agenda—test-based teacher and school evaluation with rewards and punishments, large-scale expansion of charters, and the closure of underperforming public schools.

Newark had been taken over by the state previously. Booker and Christie, with advice from a small group of state reform leaders and donors, hired Cami Anderson as superintendent. At the time, Anderson had limited school management experience but was a staunch supporter of reform. Under her leadership, expensive consultants were hired and decisions were made with virtually no transparency. Test-and-Punish was ardently pursued. Anderson did hire some effective principals, and many dedicated educators in the district recommitted themselves to improving low-performing schools. However, fiscal mismanagement and a top-down management style frustrated their efforts.

Initially, Anderson opposed the wholesale conversion of public schools to charters, viewing that effort as detrimental. Her focus was on building up low-performing schools rather than closing them, albeit with a management style that excluded and alienated teachers and principals. Unfortunately, Anderson eventually succumbed to pressure from Christie, Zuckerberg, and her reform advisors, who believed that public schools would never perform, could not be improved, and therefore should be replaced by charters. The district closed large numbers of neighborhood schools, disrupting communities, children, and families and draining needed improvement resources from the remaining public schools. Anderson rightly complained that she was “expected to turn Newark’s public schools into a national model, yet as children left for charters—and state funds followed them—she would be continually closing schools and dismissing teachers, social workers, and guidance counselors.”

Some Newark charters performed well, but on the whole the majority of students wound up in worse schools farther from home. Christie did not help matters when he slashed public school funds and supported increased resources for charters. The project in Newark was a bust. Five years after it began, student gains were minimal but parents and an entire community were left seething. Educators in Newark were utterly demoralized. A chastened Zuckerberg then switched philosophies, investing $120 million in low-income Bay Area schools that were committed to pursuing a more collaborative and supportive approach.

BBS Companion Articles

Why Conventional School “Reforms” Have Failed
Reformers Target the Wrong Levers of Improvement
Teacher and School Evaluations Are Based on Students’ Test Scores
Charter Schools Are Not the Key to Improving Public Education
Four Nostrums of Conventional School Reform
Reformers Allowed Their Rhetoric to Be Hijacked
How Top Performers Build-and-Support
Ground Efforts in Unassailable Research
Provide Engaging Broad-Based Liberal Arts Curriculum
Provide High-Quality Instruction
Build Teams and Focus on Continuous Improvement
Provide Adequate School Funding
Lessons Learned from Successful Districts
Exemplary Models of Build-and-Support

Reference Notes

The Problem of Low Performance: Real or Hype?
Carnoy, M., García, E., & Khavenson, T. (2015, Oct 30). Bringing It Back Home: Why State Comparisons Are More Useful Than International Comparisons for Improving U.S. Education Policy. Economic Policy Institute. http://www.epi.org/publication/bringing-it-back-home-why-state-comparisons-are-more-useful-than-international-comparisons-for-improving-u-s-education-policy/

Distressing International Results
Goodman, M. J., Sands, A. M., & Coley, R. J. (2015). America’s Skills Challenge: Millennials and the Future. Educational Testing Service. http://www.ets.org/s/research/29836/

Barshay, J. (2013, Dec 3). Top US Students Fare Poorly in International PISA Test Scores, Shanghai Tops the World, Finland Slips. http://educationbythenumbers.org/content/top-us-students-fare-poorly-international-pisa-test-scores-shanghai-tops-world-finland-slips_693/

Hanushek, E.A., & Woessmann, L. (2015). Universal Basic Skills: What Countries Stand to Gain. OECD. http://www.keepeek.com/Digital-Asset-Management/oecd/education/universal-basic-skills_9789264234833-en#page1

Sparks, S. D. (2016, Feb 10). OECD: U.S. Efforts Haven’t Helped Low Performers on Global Math, Reading Tests. http://blogs.edweek.org/edweek/inside-school-research/2016/02/OECD_American_efforts_low_performers.html?cmp=eml-enl-eu-news2-RM

Do International Tests Fairly Reflect Socioeconomic Factors?
Carnoy, M., García, E., & Khavenson, T. (2015, Oct 30). Bringing It Back Home: Why State Comparisons Are More Useful Than International Comparisons for Improving U.S. Education Policy. Economic Policy Institute. http://www.epi.org/publication/bringing-it-back-home-why-state-comparisons-are-more-useful-than-international-comparisons-for-improving-u-s-education-policy/

The Horace Mann League and the National Superintendents Roundtable. (2015, Jan). School Performance in Context: Indicators of School Inputs and Outputs in Nine Similar Nations. The Horace Mann League and the National Superintendents Roundable. http://www.hmleague.org/fullreport/

Are International Tests Useful Measures of Achievement?
Strauss, R. (2013, Feb 1). Do International Test Scores Matter? Renewing America. http://blogs.cfr.org/renewing-america/2013/02/01/education-do-international-test-scores-matter/ See also Tucker, M. (2016, Nov 19). The Iceberg Effect: A Reply to James Harvey and Charles Fowler. http://blogs.edweek.org/edweek/top_performers/2015/11/the_iceberg_effect_a_reply_to_james_harvey_and_charles_fowler.html and Ravitch, D. (2013, Dec 3). My View of the PISA Scores. https://dianeravitch.net/2013/12/03/my-view-of-the-pisa-scores/ and Tucker, M. (2015, Nov 24). ESEA reauthorization and Standards: A Chance to Do It Right. Top Performers. http://blogs.edweek.org/edweek/top_performers/2015/11/

Thanhnien News. (2013, Dec 7). Vietnam Deputy Education Minister Not Convinced by Global Test. Thanhnien News. http://www.thanhniennews.com/education-youth/vietnam-deputy-education-minister-not-convinced-by-global-test-18276.html

Hanoi, M. I. (2013, Dec 12). Very Good on Paper: Education in Vietnam. http://www.economist.com/blogs/banyan/2013/12/education-vietnam

Ravitch, D. (2014, Nov 20). The Myth of Chinese Super Schools. The New York Review of Books. http://www.nybooks.com/articles/archives/2014/nov/20/myth-chinese-super-schools/

National Governors’ Association. (2013–2014). America Works: Education and Training for Tomorrow’s Jobs: The Benefits of a More Educated Workforce to Individuals and the Economy. National Governors Association Chair’s Initiative. http://www.nga.org/cms/home/nga-center-for-best-practices/center-publications/page-other-publications/col2-content/main-content-list/america-works-the-benefit-of-a-m.html

Misguided Reform Policies
Hart, M. (2015, Jul 6). Research: Collaboration Is Key for Teacher Quality. The Journal. http://thejournal.com/articles/2015/07/06/research-collaboration-is-key-for-teacher-quality.aspx

Wiggins, G. (2013, Oct 23). Is Significant School Reform Needed or Not?: An Open Letter to Diane Ravitch (and Like-Minded Educators). https://grantwiggins.wordpress.com/2013/10/23/is-significant-school-reform-needed-or-not-an-open-letter-to-diane-ravitch-and-like-minded-educators/

Mehta, J. (2014, Jul 18). Five Inconvenient Truths for Traditionalists. http://blogs.edweek.org/edweek/learning_deeply/2014/07/five_inconvenient_truths_for_traditionalists.html

Meager National Results
Ratner, G. M. (2015, Feb 11). Independent Test Results Show NCLB Fails. Fair Test. http://www.fairtest.org/independent-test-results-show-nclb-fails

The Nation’s Report Card. (2013). Are the Nation’s Twelfth-graders Making Progress in Mathematics and Reading? http://www.nationsreportcard.gov/reading_math_g12_2013/#/

Burns, D., & Darling-Hammond, L. (2014, Dec 18). Teaching Around the World: What Can TALIS Tell Us? Stanford Center for Opportunity Policy in Education. https://edpolicy.stanford.edu/publications/pubs/1295

Welner, K. G., & Mathis, W. J. (2015, Feb 12). Reauthorization of the Elementary and Secondary Education Act: Time to Move Beyond Test-Focused Policies. National Education Policy Center. http://nepc.colorado.edu/publication/esea

Masters, G. N. (2014, Dec). Is School Reform Working? Australian Council for Educational Research. http://research.acer.edu.au/policyinsights/1/

Reardon, S. F. (2013, Apr 27). No Rich Child Left Behind. http://opinionator.blogs.nytimes.com/2013/04/27/no-rich-child-left-behind/?_r=1

Sasso, G. M. (2016, Jan 7). To the 1 Percent Pouring Millions into Charter Schools: How About Improving the Schools That the Vast Majority of Students Actually Attend? http://www.salon.com/2016/01/07/to_the_1_percent_pouring_millions_into_charter_schools_how_about_improving_the_schools_that_the_vast_majority_of_students_actually_attend/

Ujifusa, A. (2015, Dec 15). National Graduation Rate Increases to All-Time High of 82 Percent. http://blogs.edweek.org/edweek/campaign-k-12/2015/12/national_graduation_rate_incre.html?cmp=eml-enl-eu-news2-RM

Pondiscio, R. (2016, Jan 13). The Phoniest Statistic in Education. http://edexcellence.net/articles/the-phoniest-statistic-in-education?mc_cid=6794bd3d0d&mc_eid=ebbe04a807

Brounstein, K., & Yettick, H. (2015, Feb 24). Rising Graduation Rates: Trend or Blip? Education Week. http://www.edweek.org/ew/articles/2015/02/25/rising-graduation-rates-trend-or-blip.html?cmp=ENL-EU-NEWS2

DeArmond, M., Denice, P., Gross, B., Hernandez, J., Jochim, A., & Lake, R. (2015, Oct). Measuring Up: Educational Improvement and Opportunity in 50 Cities. Center on Reinventing Public Education. http://www.crpe.org/publications/measuring-educational-improvement-and-opportunity-50-cities

Strauss, V. (2015, Sep 8). What the New SAT Scores Reveal About Modern School Reform. https://www.washingtonpost.com/blogs/answer-sheet/wp/2015/09/08/what-the-new-sat-scores-reveal-about-modern-school-reform/

DiCarlo, M. (2015, Dec 4). Evidence from a Teacher Evaluation Pilot Program in Chicago. http://www.shankerinstitute.org/blog/evidence-teacher-evaluation-pilot-program-chicago

Carnoy, M., & Rothstein, R. (2013, Jan 28). What Do International Tests Really Show about U.S. Student Performance? Economic Policy Institute. http://www.epi.org/publication/us-student-performance-testing/

Petrilli, M. J., & Finn, C. E., Jr. (2015, Apr 8). College Preparedness Over the Years, According to NAEP. http://edexcellence.net/articles/college-preparedness-over-the-years-according-to-naep

The Nation’s Report Card. National Assessment of Educational Progress (NAEP). http://www.nationsreportcard.gov/

Masters, G. N. (2014, Dec). Is School Reform Working? Australian Council for Educational Research. http://research.acer.edu.au/policyinsights/1/

Collateral Damage Caused by Reform
Mehta, J., & Fine, S. (2015, Dec). The What, Where and How of Deeper Learning in American Secondary Schools. Jobs for the Future. http://www.jff.org/publications/why-what-where-and-how-deeper-learning-american-secondary-schools

Lampert, M. (2015, Dec). Deeper Teaching. Jobs for the Future. http://www.jff.org/publications/deeper-teaching

Amarillas, M. (2016, Feb 4). Deeper Learning, Metacognition, and Presentations of Learning. http://blogs.edweek.org/edweek/learning_deeply/2016/02/deeper_learning_metacognition_and_presentations_of_learning.html?utm_source=feedblitz&utm_medium=FeedBlitzRss&utm_campaign=learningdeeply

Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015, Oct). Student Testing in America’s Great City Schools: An Inventory and Preliminary Analysis. Council of the Great City Schools. http://cgcs.org/site/default.aspx?PageType=3&ModuleInstanceID=312&ViewID=7B97F7ED-8E5E-4120-848F-A8B4987D588F&RenderLoc=0&FlexDataID=2146&PageID=257

Fair Test. (n.d.). Reports: High Stakes Testing Hurts Education. http://fairtest.org/reports-high-stakes-testing-hurts-education See also Švigelj-Smith, M. (2015, Feb 5). The High Cost of High-Stakes-Testing: (Spoiler Alert! It Hurts Students with Disadvantages the Most!) https://msvigeljsmith.wordpress.com/2015/02/05/the-high-cost-of-high-stakes-testing-spoiler-alert-it-hurts-students-with-disadvantages-the-most/

Kamenetz, A. (2015). The Test: Why Our Schools Are Obsessed with Standardized Testing—But You Don’t Have to Be. New York: PublicAffairs/Perseus Book Group.

Have Individual Components of Reform Worked?
Hattie, J. (2009). Visible Learning: A Synthesis of Meta-Analyses Relating to Achievement. New York: Routledge.

Center for Research on Education Outcomes. (2015). Urban Charter School Study: Report on 41 Regions. Stanford University. http://urbancharters.stanford.edu/summary.php

Hattie, J. (2015, Jun 16). What Works Best in Education: The Politics of Collaborative Expertise. Australian Policy Online. http://apo.org.au/resource/what-works-best-education-politics-collaborative-expertise See also Hirsh, S. (2015, Nov 18). Leverage the Power of Collaborative Expertise. http://blogs.edweek.org/edweek/learning_forwards_pd_watch/2015/11/leverage_the_power_of_collaborative_expertise.html?utm_source=feedblitz&utm_medium=FeedBlitzRss&utm_campaign=learningforwardspdwatch

Lavigne, A.L., & Good, T.L. (2014). Teacher and Student Evaluation: Moving Beyond the Failure of School Reform. New York and London: Routledge, See also three excellent books on the failure of the “reform” program: Ravitch, D. (2013). Reign of Error: The Hoax of the Privatization Movement and the Danger to America’s Public Schools. New York: Alfred A. Knopf; Ravitch, D. (2010). The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education. New York: Basic Books; and DuFour, R. (2015). In Praise of American Educators and How They Can Become Even Better. Bloomington, IN: Solution Tree.

Whitehurst, G. J. (2009, Oct). Don’t Forget Curriculum. Brookings. http://www.brookings.edu/research/papers/2009/10/14-curriculum-whitehurst

What Works Clearinghouse. U.S. Department of Education. Institute of Education Sciences. http://ies.ed.gov/ncee/wwc/

A Tale of Two Cities
Kirp, D. L. (2013). Improbable Scholars: The Rebirth of a Great American School System and a Strategy for America’s Schools. Oxford: Oxford University Press.

Russakoff, D. (2015). The Prize: Who’s in Charge of America’s Schools? Boston: Houghton Mifflin Harcourt.

Kirp, D. L. (2016, Jan 9). How to Fix the Country’s Failing Schools: And How Not To. The New York Times. http://www.nytimes.com/2016/01/10/opinion/sunday/how-to-fix-the-countrys-failing-schools-and-how-not-to.html?ref=opinion&_r=1

Berwick, C. (2013, Apr 1). Can the Model for Urban School Reform Be Found in Union City, New Jersey? https://nextcity.org/daily/entry/can-the-model-for-urban-school-reform-be-found-in-union-city-nj

Rich, M., Cox, A., & Bloch, M. (2016, Apr 29). Money, Race, and Success: How Your School District Compares. The New York Times. http://www.nytimes.com/interactive/2016/04/29/upshot/money-race-and-success-how-your-school-district-compares.html?_r=3

Goldman School of Public Policy. (2013). Abstract of Kirp, D. Improbable Scholars: The Rebirth of a Great American School System and a Strategy for America’s Schools. https://gspp.berkeley.edu/research/selected-publications/improbable-scholars-the-rebirth-of-a-great-american-school-system-a-strateg

Cramer, P. (2015, Sep 10). When an Outsider Arrives to Shake Up a School System, a Tightrope Walk Follows. http://ny.chalkbeat.org/2015/09/10/when-an-outsider-arrives-to-shake-up-a-school-system-a-tightrope-walk-follows/#.VlFOETZdE2w

Nocera, J. (2015, Sep 8). Zuckerberg’s Expensive Lesson. The New York Times. http://www.nytimes.com/2015/09/08/opinion/joe-nocera-zuckerbergs-expensive-lesson.html?ref=todayspaper See also Weber, M. (2015, Sep 8). Book Review: “The Prize” by Dale Russakoff. http://jerseyjazzman.blogspot.com/2015/09/book-review-prize-by-dale-russakoff.html?m=1. For another thoughtful analysis of Russakoff’s book, see Thompson, J. (2015, Oct 10). Will Reformers Learn a Lesson from Newark? Dale Russakoff’s “The Prize” Could Help. http://www.livingindialogue.com/will-reformers-learn-a-lesson-from-newark/

Designed and Developed by Pointline.net