Why Conventional School “Reforms” Have Failed
The Reformers Target the Wrong Levers of Improvement

by Bill Honig

The school reform movement has failed to produce results overall, and reputable evaluations have shown that individual reform measures also have proved to be ineffective. Turnaround schools, charter schools, merit pay, and test-based school or teacher accountability have had either nonexistent or trivial effects. In his book Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement, John Hattie (2008) writes that even when reforms produce small gains, they fall far below the improvements brought about by validated initiatives.

Reformers have operated under an extremely unsophisticated view of the educational landscape and how best to influence it. This causes two fundamental errors. First, they target their improvement efforts on a limited and weak set of levers for change. Secondly, they undertake solutions that have either little or no basis in research or experiential support.

Reformers fundamentally misunderstand how schools and districts work. They have primarily focused their efforts on indirect structural changes and governance reforms—limiting the power of teachers’ unions and scaling back workplace protections, using threats and incentives to pressure teachers and administrators, and promoting competition by expanding charter schools. These strategies fail to appreciate the complex factors that impact school quality and the appropriate places to focus improvement efforts. Other direct and more powerful leverage points have been shown to influence educational performance more than those areas traditionally targeted by reformers.

Factors That Impact School Quality and Student Achievement

Schools are complicated. Among the factors that influence school quality and student achievement are:

  • individual teachers and all the potential influences on those teachers including the type of content they provide, their pedagogical practice, and their level of engagement
  • members of the school community who are responsible for resource allocation, team building, and developing capacity for continuous improvement—the principal, key teachers, parents, and community leaders
  • parents’ role in supporting their child’s education
  • the district—superintendent, staff, and board—which hires teachers and principals, establishes curricular guidelines and creates curriculum, adopts materials, provides professional development and other supports for schools and teachers, allocates broad resources, defines accountability, involves the community, and ideally creates a positive climate
  • the state apparatus—the governor, state school boards, and the legislature—which funds schools, adopts standards and curricular frameworks, administers special programs, sometimes reviews instructional materials, approves charter schools, establishes state accountability systems, and provides social supports for needy children
  • the federal government, which influences all these actors through the requirements of numerous federal programs such as the strict conditions that were mandated by the now-repealed No Child Left Behind (NCLB), the Individual Disabilities Education Act (IDEA), or even the new more flexible Every Student Succeeds Act (ESSA)
  • poverty levels and range of social support systems
  • several more key stakeholders such as schools of education, textbook publishers, the research community, the blogosphere, think tanks, opinion makers, and political leaders

Any successful improvement effort must include strategies to improve the performance of each of these major stakeholders and, crucially, engage them in working toward a common goal. This requires a much more positive, comprehensive, and considered approach than the school reform community has offered thus far. An example of comprehensive policy can be found in Greatness by Design (2012), which was developed for California by Tom Torlakson, the state superintendent of public instruction. He formed a prestigious commission chaired by Linda Darling-Hammond, one of the most respected school improvement researchers in the country, and Chris Steinhauser, superintendent of the Long Beach Unified School District, which was designated one of the top districts in the world. The resulting policy document is a superb example of the more supportive and comprehensive strategy needed.

Another example of this more sophisticated approach is the excellent guide pertaining to professional development found in the Learning Policy Institute’s (2015) publication Maximizing the Use of New State Professional Learning Investments. An example of policy that addresses all the necessary components of reform at the district level is the Leadership Planning Guide California, which was produced by the California Consortium for the Implementation of the Common Core State StandardsThese topics are fully covered in Lessons Learned from Successful Districts.

Individual Reform Initiatives Are Based on Misguided Assumptions

Even when reformers’ Test-and-Punish and Choice, Charters, and Competition strategies are directed at weak leverage points, their individual measures must still succeed and avoid causing extensive collateral damage. Unfortunately, the specific measures in the reform playbook rely on discredited and faulty assumptions about the best ways to improve schools. This is why these individual reforms have produced limited or nonexistent results.

The rest of this article focuses on two of the faulty assumptions of the school reform movement: the belief that threats, pressure, and incentives work and the use of standardized math and reading test scores as the most important measures of student learning. For a detailed discussion of the lack of overall success of conventional reform initiatives, see Have High-Stakes Testing and Privatization Been Effective?

Threats and Pressure Are Not Effective

One major fallacy underlying the “reform” strategy is the flawed assumption that teachers and administrators do not care about improving educational performance and will not try to improve unless they are threatened or pressured by positive and negative incentives. This is often communicated in a politically seductive way: “It is unconscionable that many low-income students are failing. Schools and teachers must be held accountable.” Yet, while the sentiment is superficially appealing, pressure usually backfires.

Almost all school staffs want to do the best job possible. As professionals, they desire to perfect their performance and improve student achievement, but they do not necessarily possess the strategic or tactical know-how to accomplish those goals. Many work in extremely difficult school situations—bereft of capacity-building resources and student social supports such as health clinics, isolated from collaborating with other teachers, and lacking structures and techniques to help them grow professionally. The fear engendered by high-stakes accountability makes the situation worse by narrowing the curriculum, focusing on test preparation to the detriment of deeper learning, gaming the system, discouraging collaboration, and increasing widespread disaffection. A more productive strategy relies on a positive, engaging approach and concentrates on developing the leadership and infrastructure to bolster continuous improvement efforts of all teachers at a school.

The punitive strategy of Test-and-Punish has little evidentiary support and only meager backing from questionable research conducted by a few economists. For example, Milton Friedman and Eric Hanushek have argued that improvement will occur only if strong incentives push schools and districts to upgrade. Reformers have leaned heavily on those ideas—advocating the necessity of competition, consequences, and high-stakes evaluation.

However, the belief that positive or negative incentives work has been thoroughly discredited by a long history of findings that show such strategies do not produce improved student performance. In 2010, the National Research Council released Incentives and Test-Based Accountability in Education, a report edited by Michael Hout and Stewart Elliot. Hout and Elliot reviewed the research on incentives, specifically whether positive incentives such as bonuses for teachers or negative incentives such as threats of dismissal had any positive effect. They found that these policies did not produce improvements in student achievement nor bring about changes in instruction.

Fifty years ago, W. Edwards Deming warned of the negative side effects of an overreliance on evaluation strategies. Fear tends to make employees disengage, narrow their efforts, or game the system so they appear compliant. It diverts attention from and diminishes motivation to participate in developing cooperative teams and structures for continuous improvement. This ruinous situation is well known in the social sciences and articulated as Campbell’s law as explained by Diane Ravitch (2012):

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

A New York Times opinion piece by Robert Wachter (2016), a prominent physician, reinforces the point that emphasis on evaluation of teachers, or doctors, actually causes more harm than good.

Incentive Schemes Sabotage Collaboration

Specifically, reliance on incentive schemes hampers or diverts attention from collaboration—one of the main strategies for improving school performance. In fact, in The Missing Link in School Reform (2011), Carrie Leana argues that collaboration at the school site is the most powerful strategy for improving instruction. She found that instructional conversation and help from fellow teachers outweigh all other improvement initiatives. Professor Leana calls into question school reforms that pursue test-driven rewards and punishments. Since, according to Professor Leana, only about an estimated five percent of US schools are actually managed this way, the unrealized potential in expanding this approach far outweighs other strategies. Team building around powerful instruction and curriculum should be one of our major priorities. She also emphasizes that this approach requires:

  • training principals how to promote collaboration and holding them accountable for it
  • building the infrastructure to support instructional improvement and team building
  • striving to get more talented people into our schools
  • avoiding rhetoric and policies that make collaboration more difficult

Esther Quintero (2015), a management expert, has published a series of articles on the crucial importance of building social capital. In addition to being ineffective, pressure and illegitimate negative incentives lower morale and undermine positive working conditions at the school site—another key component of successful school improvement. A post by John Papay and Matthew Kraft (2015) summarized the research on the importance of a positive professional environment:

An emerging body of research now shows that the contexts in which teachers work profoundly shape teachers’ job decisions and their effectiveness. Put simply, teachers who work in supportive contexts stay in the classroom longer, and improve at faster rates, than their peers in less-supportive environments. And, what appear to matter most about the school context are not the traditional working conditions we often think of, such as modern facilities and well-equipped classrooms. Instead, aspects that are difficult to observe and measure seem to be most influential, including the quality of relationships and collaboration among staff, the responsiveness of school administrators, and the academic and behavioral expectations for students.

In conclusion, increasing accountability pressure on schools has not produced the promised results and has sabotaged the very collaboration and engagement necessary for improvement.

Standardized Tests Are Not the Best Measures of School or Teacher Quality

Another major defect in school reform thinking is the misplaced faith that a one-time annual snapshot offered by a multiple-choice test in math and language arts is the best, or even an accurate, way to gauge school or teacher performance. These widely used tests do have a legitimate role—if used sparingly. They can feed back to schools, districts, or even individual teachers identifying potential areas that need improvement or confirming that the school is on the right track. Standardized tests do give some sense of where a school or district ranks with comparable jurisdictions, and they do provide crucial sub-group information for low-income, minority, learning-disabled, or ELL students. But these expensive, ubiquitous assessments are one of the least useful measures for improving instruction and performance, and they come with huge educational costs, especially when they are tied to evaluation or reward schemes.

To begin with, testing only math and reading (and a smidgeon of science) ignores important areas of instruction—history, civics, humanities, the arts, physical education, and most of science. It also devalues other central aims of education. The result has been a considerable narrowing of instruction and a constricted view of educational purposes. For a broader perspective on educational purposes, see The Three Goals of Public Education.

Another limitation of these large-scale tests is that scores almost wholly mirror the income levels and special needs of a school’s students, which raises the question: “What is being tested?” Even within math and language arts, the end-of-year general tests currently used for school and teacher accountability usually emphasize limited, basic skills. Thus, the tests encourage teachers to neglect the deeper learning required for highly educated students, which can only be assessed by measures like essays, complex applications, and performances.

Moreover, there are much better ways to provide schools and teachers the data they need to improve instruction. Most teachers know how their students are doing. Utilizing teacher judgment of performance, enhanced by locally administered formative assessments, is a much more powerful strategy.

Essays, end-of-unit, and end-of-course tests, performances, experiments, certificates of mastery, projects, extracurricular activities, and portfolios are all more helpful than existing state or national tests. These authentic performance assessments provide a richer array of information that goes beyond content knowledge to application. They also assess important life skills such as perseverance, the ability to work in groups, communication skills, and self-monitoring. The Innovation Lab Network Performance Assessment Project at Stanford, the New Hampshire Performance Assessment Network, and the New York Performance Assessment Consortium are good sources of these types of assessments. Linda Darling-Hammond and her colleagues (2014) have written an excellent thought piece on the subject, and Stanford sponsors the Performance Assessment Resource Bank, which identifies the best K–12 performance tasks in math, English language arts, science, and history-social studies.

Regrettably, although formative, authentic assessments provide the best data to assist in improving instruction, conventional school reformers have not embraced them. There is the perception that the assessment instruments are not independent enough for high-stakes accountability. This may be due to a misplaced distrust of teachers and educators or because the assessments are viewed as too expensive, time consuming, and subject to manipulation. The new nationwide Smarter Balanced Assessment Consortium (SBAC) and Partnership for Assessment of Readiness for College and Careers (PARCC) tests, which many states gave in 2015, are an improvement from previous tests. Still, to provide a more accurate picture of student achievement, their results need to be substantially augmented by the classroom and school measures I’ve described.

More importantly, too much emphasis on tests for accountability purposes ignores other gauges of school effectiveness such as graduation rates, course taking, honors, extracurricular activities, career preparation, and student and teacher engagement. So whether or not test scores improve or lag, they only partially measure how students are doing, and they are not informative enough to sufficiently evaluate the effectiveness of schools or teachers. The new Every Student Succeeds Act (ESSA) passed in 2015, which replaced NCLB and Race to the Top, allows states to use a much broader array of assessment measures.

In Mission High: One School, How Experts Tried to Fail It, and the Students and Teachers Who Made It Triumph, Kristina Rizga (2015) chronicles how measuring school effectiveness by test scores alone can lead to harmful conclusions. Diane Ravitch reviewed Rizga’s book in “Solving the Mystery of the Schools” (2016) in The New York Review of Books. Ravitch comments:

Mission is a “failing school” because it has low test scores. When Rizga [the author] first entered Mission in 2009, it was one of the lowest-performing schools in the nation, as judged by standardized test scores. And yet, contrary to the test scores, 84 percent of its graduates were accepted to college, and other indicators were positive.

Rizga followed several students who had recently moved to the US and who consequently scored low on standardized tests while making substantial academic progress.

In her review, Ravitch explains:

One of the six students Rizga followed closely, an immigrant from El Salvador named Maria, asked her, “How can my school be flunking when I’m succeeding?” Maria arrived at Mission High School knowing no English. After only one year in the U.S., she had to take the same state tests as other students.

Kristina Rizga writes:

By eleventh grade she was writing long papers on complex topics like the war in Iraq and desegregation. She became addicted to winning debates in class . . . . In March 2012 Maria and her teachers celebrated her receiving acceptance letters to five colleges, including the University of California at Davis, and two prestigious scholarships.

Ravitch sums up:

Rizga devotes chapters to the students she gets to know well, who blossom, as Maria did, as a result of their interactions with dedicated Mission teachers. She also devotes chapters to teachers who devote themselves to their students with intense enthusiasm. What the teachers understand that reformers . . . do not is that human relationships are the key to reaching students with many economic and social problems.

Rizga realized that standardized test scores are not the best way to measure and promote learning. Typically, what they measure is the demographic profile of schools. Thus, schools in affluent white suburbs tend to be called “good” schools. Schools that enroll children who are learning English and children who are struggling in their personal lives have lower scores and are labeled “failing” schools. Hundreds, if not thousands, of such schools have closed in the past decade. . . .

Ravitch gives Rizga the final word:

Some of the most important things that matter in a quality education—critical thinking, intrinsic motivation, resilience, self-management, resourcefulness, and relationship skills—exist in the realms that can’t be easily measured by statistical measures and computer algorithms, but they can be detected by teachers using human judgment. America’s business-inspired obsession with prioritizing “metrics” in a complex world that deals with the development of individual minds has become the primary cause of mediocrity in American schools.

In conclusion, using test scores alone can easily misrepresent the performance of a school. Focusing on limited, basic-skills tests and attaching potential high-stakes consequences to them cause substantial harm to instruction, engagement, and student performance.

As discussed in Have High-Stakes Testing and Privatization Been Effective?, test-driven threats and incentives lead to narrowing of the curriculum, devoting inordinate time and resources to test preparation, concentrating on those students just below cut-points, gaming the system, and discouraging collaboration among teachers. This is all to the detriment of good instruction and deep, lasting student learning.

A Council of the Great City Schools (2015) report found that increases in testing time did not improve instruction but did cause significant collateral damage. For a heart-wrenching testament to the devastation done by the US obsession with test-driven education, read The Test: Why Our Schools Are Obsessed with Standardized Testing—But You Don’t Have to Be by Anya Kamenetz (2015).

In 2015, President Obama and former secretary of education Arne Duncan issued a “mea culpa.” They cautioned against over-testing and the harm caused by too much attention to standardized tests. President Obama stated, “Learning is about so much more than filling in the right bubble,” and he called for “tests to be high-quality, a limited part of the curriculum, and just one measurement of a student’s progress.”

In a letter to Arne Duncan, Georgia state school superintendent Richard Woods aptly described our system of test-based accountability:

Our broken model of assessment is too focused on labeling our schools and teachers, and not focused enough on supporting our students. Our current status quo model is forcing our teachers to teach to the test. We need an innovative approach that uses tests to guide instruction, just as scans and tests guide medical professionals. Oftentimes, we hear teachers called professionals because they have the knowledge and skill set to reach the needs of their individual students, yet in our accountability measures we have not supported or given value to diagnostic tools and tests that teachers need to fully utilize that knowledge or those skills. We must find a balance between accountability and responsibility.

Resistance to over-testing has been gathering steam in many local districts and states and at the national level as exemplified in the spreading opt-out movement. Unfortunately, testing still looms large in the daily life of most teachers and students. Simply reducing the time devoted to the administration of standardized tests does not repair the damage caused by spending inordinate instructional time on test preparation, narrowing the curriculum, or the questionable use of test scores for high-stakes personnel decisions.

Recent Developments

7/30/2016 Consistent with the failure of pay for performance efforts in education, such schemes also are problematic for hospitals. http://harvardmagazine.com/2016/06/are-hospital-pay-for-performance-programs-failing

