High-stakes tests are being reconsidered

This article was originally published in The Notebook. In August 2020, The Notebook became Chalkbeat Philadelphia.

When it comes to standardized testing, Helen Gym and Bill Green may not exactly be on the same page, but they’ve both been reading the same book.

“It’s like any pendulum – has it swung too far in favor of standardized tests? Probably,” said Green, the reform-minded School Reform Commission member and champion of data-driven decisions.

“We want to be looking at quantitative measures, not just qualitative measures. But we’ve definitely gone too far to the extreme,” said Gym, a self-described populist, who won a City Council seat on a platform of protecting public education.

Gym and Green are just two of a growing number of policymakers and advocates who believe that the time has come to re-assess what schools assess and how they do it.

Even as testing embeds itself ever deeper in school district practices nationwide, growing evidence suggests that high-stakes tests can detract from quality instruction while unfairly punishing underfunded districts and struggling schools.

Parents, students, teachers, and local leaders are growing increasingly vocal about their concerns, and decision-makers at every level – local, state and federal – now say they want to find ways to make tests more effective and less intrusive.

And in Washington, Congress is on the verge of rewriting the No Child Left Behind Act, retaining testing mandates, but giving states new flexibility about how they intervene in schools that score poorly.

“People are piling on more and more tests. It’s smart to look at what’s necessary and what’s not,” said Kate Shaw, head of the local nonprofit Research for Action.

“But the question is not, ‘Are we going to have testing or not?’” she said. “The question is, ‘What is the appropriate kind of assessment, for what purpose, and for what audience?’”

Keystones in the crosshairs

In Pennsylvania, the recent focus of that debate is on the increasingly unpopular Keystone exams. Passing scores on these tests are slated to become graduation requirements in 2017.

Originally intended to boost both quality and equity, the Keystones are emerging as Pennsylvania’s leading example of high-stakes testing gone wrong.

“They take the test in May. They find out that they failed over the summer. Absolutely nothing transpires between September and January for them to improve … and they’re forced to take the exact same test again in January,” said Gym. “That is inane.”

Pennsylvania began administering the three Keystones – Literature, Algebra I and Biology – in 2012. Students have been failing in droves ever since.

In response, legislators from both parties are moving to delay full Keystone implementation and possibly revise their high-stakes nature entirely – an effort that Gov. Wolf’s administration says it’s ready to support.

Such moves reflect widespread unhappiness with Keystones in Pennsylvania. Problems aren’t concentrated in any particular region or demographic.

“You really do have a statewide issue,” said Research for Action’s Adam Schott.

Students do best on the Literature exam, but in 2014-15, one in four failed statewide. Biology pass rates are the lowest, at 59 percent. In Philadelphia, Central and Masterman High School students virtually all passed Biology, but those are the exceptions, even among magnet schools; at Girls High and Philadelphia High School for Creative & Performing Arts, for example, the number was less than 60 percent.

And in comprehensive neighborhood schools, the pass rates can be shockingly low: 33 percent passed Biology at Northeast; 24 percent at George Washington; 16 percent at Furness; and 2 percent or less at Strawberry Mansion, Bartram, Kensington Urban Education, and Kensington Health Sciences.

Nor is it just an urban problem: just 81 percent of Radnor High students passed biology, and 64 percent at Cheltenham High.

That leaves some questioning the quality of the exam itself.

“There’s something wrong if people who go on to elite colleges can’t pass,” said Marissa Golden, a Lower Merion School District board member.

All this leaves the Keystones with few friends in Harrisburg. In June, the Senate unanimously approved a delay bill. This fall, the House education committee unanimously backed the same bill.

So far, not one legislator has voted to keep the 2017 Keystone deadline in place.

And what’s more, the House committee added an amendment giving the Department of Education six months to develop alternatives to the Keystones and reconsider its entire range of assessments.

“There’s [a] sense that there’s got to be a different system of measuring academic success,” said State Rep. James Roebuck, the committee’s ranking Democrat. “I’m not sure if we’re going to come back to the Keystones being that measure.”

The Department of Education is “supportive” of a delay, spokesperson Nicole Reigelman said in a statement, and is prepared to explore alternatives that “holistically assess student and school achievement.”

This is in keeping with Gov. Wolf’s broader interest in reducing districts’ overall reliance on tests. The Education Department’s one request: “PDE would prefer a term of 12 months” to complete its analysis and find new options, Reigelman said.

Vital data at stake

In the Philadelphia School District, officials are likewise rethinking their approach. A new task force will launch in January to reexamine the District’s full range of tests and assessments.

“It should be a stakeholder discussion,” including students and parents, said SRC Chair Marjorie Neff. “The big question is, how do we maintain high standards for kids without making tests barriers to entry or exit? That’s the big tension around testing.”

This kind of reconsideration is taking place nationwide.

“It certainly feels like this is a moment where people are taking a step back and trying to think sanely and logically,” said Shaw, of Research for Action.

The danger, she said, is that unhappiness with Keystones and PSSAs will undermine the broader effort to address a problem that affects even prosperous districts.

“There is a reason this all started,” said Shaw. “[Testing] has gotten out of hand, but we do have kids that are poorly served by public schools.”

Among the most important contributions testing is making, Shaw said, is clear data about performance of “subgroups” – including racial minorities, low-income students, and English language learners.

It was that aspect of the testing agenda that led a coalition of civil rights groups – including the national NAACP – to warn parents against opting out.

“I’m very sympathetic to the concerns of the civil rights groups, who are very concerned about the inability to monitor progress, equity, school quality,” said Gym.

But many of the same groups are also deeply concerned about policymakers’ over-reliance on test data, and the lack of follow-up that actually helps students. The Pennsylvania NAACP, for example, strongly opposes the Keystone graduation requirement. Test data isn’t worth much, Gym said, if all it does is drive school closings and deny students diplomas.

“We’re starting to understand that poor students aren’t just failing,” Gym said. “They’re being failed.”

From year-end test to feedback loop

Beyond concerns about Keystones – a top priority for Gym when she takes office in January – lies a thicket of other controversies about test-reliant measures, such as the various school performance ratings and the use of “value-added” measurements for teacher evaluations, a technique that the American Education Research Association says needs “substantial investment” before it’s ready for prime time.

Gym says the key to shaping policy at this stage is to organize parents, students, and teachers to put pressure on decision-makers.

Union leaders say the same. The Philadelphia Federation of Teachers’ Jerry Jordan calls the new culture “test-crazed.” He and his slate face a challenge for union leadership in 2016 from the Caucus of Working Educators.

His opponent for president, teacher Amy Roat, downplays policy differences with Jordan but promises a more activist approach.

“We look at [PSSAs] for about five minutes,” Roat said. “The public should be thanking God that we look beyond test scores, or we would be giving up on these kids.”

It’s a criticism heard from many quarters: For all the data that’s collected, Keystones and PSSAs don’t give school-level decision-makers much timely, day-to-day help.

Lower Merion’s Golden said, “We have a new superintendent, and he has contacted the state repeatedly to get more information … to see how the students are doing and what can be done. He’s been quite frustrated. … It’s hard to use them as an educational tool.”

Even a data-lover like Green called such concerns “exactly right.” Scores that arrive months after a student leaves a classroom aren’t much help to teachers, he said.

But to Green, the way forward should involve more data collection, not less.

“What the research shows is that you need a constant feedback loop … between 10 and 16 curricula-based assessments each year,” said Green, citing Mastery and KIPP as charter schools that benefit from such systems. “That is real-time information going to teachers and to the administration.”

The notion of yet another layer of assessment worries Roat and other advocates who remain deeply suspicious of the testing industry.

“I think there’s been so much pushback that they’re changing their game plan,” she said.

But RFA’s Shaw said her study of one form of ongoing, classroom-based evaluation – so-called “formative assessments” – shows real promise.

“We’ve been watching teachers use those tools very effectively and very enthusiastically,” Shaw said – especially when those teachers help develop the measures. “When you have strong assessment tools aligned with strong curriculum, you will have teachers telling you over and over again, ‘This is terrific. My students are learning more.’”

No such tool is a “silver bullet,” Shaw said.

And no student test can capture the full range of factors that determine “school quality,” said Gym. Stability of leadership, experience of teachers, resources for students – “these are all measures of quality,” she said.

So for many, the move to reconsider testing is an encouraging sign that legislators and leaders are ready to make smart midcourse corrections.

But the other possibility is that leaders who see the academic shortfalls revealed by Keystones are now simply looking for a way to avoid addressing them.

“Is their work really animated by concern over overtesting?” asked Schott. “Or is their concern to avoid any accountability moving forward?”

High-stakes tests are being reconsidered

At every level, policymakers are warming up to advocates’ calls to make the exams less intrusive and more effective.

Keystones in the crosshairs

Vital data at stake

From year-end test to feedback loop