Review finds flaws in discipline report

The Fordham Institute looked at District data related to suspensions. The National Education Policy Center took issue with Fordham’s findings.

This article was originally published in The Notebook. In August 2020, The Notebook became Chalkbeat Philadelphia.

A recent report criticizes the Phil­adelphia School District’s move to reduce the use of suspensions for minor offenses, but the National Ed­ucation Policy Center at the University of Colorado is pushing back against the report, saying that it cherry-picks data.

The report’s authors reply that the critics don’t understand standard meth­ods in econometrics, a branch of eco­nomics that uses mathematics and statis­tics to quantify economic relationships.

The report criticizing the District, which came out in December, was pro­duced by two independent research­ers for the Fordham Institute, a think tank that is a member of the American Legislative Exchange Council, which pushes conservative legislation in states around the country.

The left-leaning National Educa­tion Policy Center, established in 2010 to review existing research and conduct its own, took issue with “logical falla­cies, overly simplified interpretations of findings, and inflammatory language” in the report, particularly the foreword.

That foreword, written by two Fordham staffers, criticizes the Obama administration for promoting a “near-total reversal on school discipline pol­icy” by encouraging alternatives to sus­pension, which “has led to empirically unproven strategies such as restorative justice.”

The Fordham report contends that the data show that a change in the Dis­trict’s code of conduct in the 2012-13 school year aimed at reducing out-of-school suspensions for minor infrac­tions had “no long-term impact” on those suspensions.

It also says that “associational” evi­dence indicates that the policy change improved attendance, but not academic achievement, of students who had been suspended; that it harmed the achieve­ment of never-suspended students in disadvantaged schools; and that it led to an “increase in racial disproportion­ality” in the use of suspensions.

The National Education Policy Center disputes those claims in its re­view released in February, at least as they are stated by the authors.

“There are several instances in which the findings and conclusions pre­sented in the report either contradicted or overlooked results from the original research studies,” the review reads. “The pattern of inconsistent reporting of the original study results indicates some ‘cherry-picking’ in order to make a partisan argument against current fed­eral policy.”

Matthew Steinberg, one author of the Fordham report and an assistant professor at the University of Pennsyl­vania’s Graduate School of Education, said he did not write the foreword and put nothing in the report about the Obama administration’s recommenda­tion to reduce suspensions.

The report cites research on school discipline done by Daniel Losen, direc­tor of the Center for Civil Rights Rem­edies – part of the Civil Rights Project at UCLA – and he strongly disagrees with how the conclusions are framed.

“The findings actually suggest that the reforms were effective when implemented correctly,” Losen said. “But [Fordham’s report] is being used to criticize the guidance that had nothing to do with this homegrown Philadelphia initiative. … That’s a kind of disingenuous use of the re­search” by Fordham.

Steinberg and Losen do agree on one thing: The study indicates that, when school discipline reforms are not coupled with additional supports and resources for teachers, they are less like­ly to succeed – especially in the highest-poverty schools.

Steinberg said: “Schools that may be struggling the most in terms of stu­dents’ academic performance and stu­dents’ behavioral concerns are the schools that may have the least capac­ity to implement these types of reforms and likely require additional district-level resources.”

The Fordham report examined four research questions. The foreword presents four summarizing statements about what the data showed. These statements were those labeled as cher­ry-picking the results by the National Education Policy Center.

Number of low-level suspensions

The first summary statement is: “Changes in district policy had no long-term impact on the number of low-level ‘conduct’ suspensions.”

It goes on to say that most schools did not fully comply with the policy change. However, the method used to draw these conclusions lumps the schools together, regardless of whether they attempted to implement the policy.

Researchers divided schools into three categories: “full compliers,” which eliminated the suspensions; “partial compliers,” which reduced the number but did not eliminate them; and “non compliers,” which did not reduce these suspensions.

In the first year after the change, full compliers went from giving 2.5 percent of students a conduct suspension to zero percent. Among partial compliers, the rate declined from 6 to 3 percent. Only among non compliers did the rate rise, from 3 to 6 percent.

The Fordham report asserts that any reductions are offset by increases in the final year studied. But the report does not provide a breakdown, so it’s impossible to see whether the decline in suspensions at complying schools was offset by a larger increase among non-complying schools that year.

The report was based on two re­search papers, one of which does give some raw data, and the data include more school years. It breaks down the conduct suspensions into a per-capita rate averaged over the six years lead­ing up to the reform (the “pre-reform period”) and the two years afterward (“post-reform period”).

Per capita suspensions of all kinds declined 27 percent from the pre-reform period to the post-reform period. Across all other Pennsylvania school districts, the rate declined only 15 percent. The data were similar for conduct-related suspensions, which declined by 5 per 100 students compared to all other dis­tricts in the state, where suspensions for conduct declined by less than 1 per 100 students. These numbers were not in­cluded in the Fordham report.

Harold Jordan, senior policy ad­vocate for the ACLU of Pennsylvania who did a statewide analysis of school discipline policies for his 2015 report, said the report treats the policy change as if it were the only one of its kind in the years being studied, but that’s not the case (Jordan is a longtime Note­book board member).

Jordan, whose children attended schools in the District, was involved in negotiations on changes to the code of conduct, and he said they were “gradu­al” over several years and were not iso­lated to 2012-13, the year in question in the report.

Steinberg dismissed this as irrel­evant for districtwide changes, saying it would become relevant only if policies were put in place at some schools but not others.

Jordan said that, under the leader­ship of Superintendent Arlene Ack­erman from 2008 to 2011, principals around the District became alienated from the central office and felt that edicts were being handed down without consulting school-level leaders. Superintendent William Hite took over in 2012, the year of the policy change, and he has worked to repair relationships with principals by giving them more autonomy to make changes at the school level.

In other words, ever since the year that the policy was implemented, there have been exactly the kinds of school-level policy changes that Steinberg was referring to.

Attendance and achievement

The Fordham report’s second sum­mary statement says: “Changes in dis­trict policy were associated with im­proved attendance – but not improved achievement – for previously suspended students.”

The National Education Policy Center disagrees.

“This [claim] contradicts the origi­nal study, which concluded that previ­ously suspended students made mar­ginal but statistically significant gains in math proficiency,” the review states. “Findings consistently indicated that outcomes improved in the post-reform period for students who had been sus­pended prior to the policy change.”

The more pervasive problem is that this measure includes all students in Phil­adelphia’s public schools – even those at the schools that continued suspending students for low-level misconduct.

“It seems like there were benefits where the policy was fully implement­ed,” Losen said. “You can’t say: ‘Here’s the negative consequence of the reform using some data from schools where the reform was not carried out at all.’”

Jordan said it was unsurprising to find mixed results from a policy change just two years after it had been adopted.

“When the District implemented the suspension ban for kindergartners, it wasn’t implemented perfectly either,” Jordan said. “The District, after that school year, made an effort to educate principals about the policy. … That’s the struggle.”

Effects on non-suspended students

According to the third summary statement, students who were never sus­pended “experienced worse [academic] outcomes in the most economically and academically disadvantaged schools, which were also the schools that did not (or could not) comply with the ban on conduct suspensions.” The report im­plies in several places that this is a “con­sequence” of the policy change, despite labeling the evidence “associational.”

The policy center’s response fo­cused on two aspects of this statement. First, it added missing context: As the Fordham report acknowledged, the control group of students who were never suspended came from more-afflu­ent schools.

“Such schools are likely to be mean­ingfully different on several dimensions that are related to test scores and at­tendance, but were not accounted for in the study,” the review states. “High poverty and racially segregated schools tend to have lower teacher quality and offer weaker opportunities for academic engagement than schools serving more advantaged students.”

Thus, the policy center says, the control group of students who were never suspended is “meaningfully dif­ferent” from the partial and non com­plier groups it was compared to, and that “likely influenced the outcomes.”

Steinberg calls this a misun­derstanding of a common practice in econometrics, which “compares changes in outcomes across two groups – one treated by a policy in­tervention and another that’s not treated.”

“It’s not the case that they need to be observably equivalent,” he said. “The pre-trends must be paral­lel, meaning [data from] those two groups evolved similarly leading up to the intervention.”

The National Education Policy Center’s response went on to ques­tion what it called “a strange manip­ulation of reasoning.”

“Achievement declined among students who were not suspended pri­or to the reform but attended schools that continued to use conduct sus­pensions,” it says. “They argue the decline in achievement among these ‘rule-abiders’ was due to the inability of these schools to suspend misbe­having peers. However, this interpre­tation contradicts the fact that these schools continue to use suspensions.”

Steinberg said the policy center’s critique misses the point that never-suspended students at schools that fully complied were unaffected.

“We don’t know what resources existed at these schools that fully complied” compared to those that did not, he said. “So the important question remains: What were these schools doing that may or may not have insulated the non-suspended peers?”

Racial disproportionality

The foreword’s final summary state­ment is: “Revising the district’s code of conduct was associated with an increase in racial disproportionality at the district level.”

The metric used to draw this conclu­sion is the ratio of suspensions for black students to suspensions for white students.

To Losen’s dismay, Fordham’s report does not include the raw data – the to­tal number of suspensions issued to both groups. Losen pointed out that an increase in this ratio does not necessarily repre­sent an increase in the number of black students suspended. It could be the result of a larger decline in suspension of white students than the concurrent decline in suspension of black students, he said.

Steinberg said: “That might be true, but the gap got bigger.” He added that ratios are the common measure of racial disproportionality.

Losen wants the raw data to be re­leased so other researchers can review it, as an alternative to the peer review that the report has not yet received. One of the papers on which the report is based has been peer reviewed, and the other is awaiting review.

“They gave us the equation and the results coming out of their equation, but didn’t give us the variables being input into the equation to get those results,” Losen said.

He asked the authors for that data, but they did not provide it. A spokesperson for the University of Pennsylvania Graduate School of Education said that the data was obtained through a data-sharing agreement with the District and contained student names that could not be shared by law. Losen, however, was not asking for names. Still, sharing any of the data would have violated Penn’s data sharing agreement with the District, according to the spokesperson.

Greg Windle is a staff reporter at the Notebook.