Litmus Test for Urban School Districts

by Siegfried Engelmann
July 2005

A man who had never seen a bicycle received a kit, completely unassembled, with instructions. The man felt he had a sense of machinery, and he didn't see the need for some parts. Also, he added parts he felt were necessary. The man believes that his assembly process was appropriate. His bike runs, but it's hard to pedal and hard to steer. Question: How much of the observed performance is created by the way he assembled the bike? If the man had accurate information on bikes that had been assembled properly, he would have a basis for comparing his bike with the standard model. As it is, he doesn't know what the bike's potential is because the bike's performance is the product of two variables, the machinery and the way it is assembled.

The same problem exists with urban school districts. They implement approaches according to their rules or standards, rather than the developers' guidelines. The result is the same as that of the bike. The performance of students is now the product of two variables—the approach and the way the district implemented it. Just as the man has no basis for comparing his bike with those assembled according to the manufacturer's specifications, the district has no basis for comparing the results it achieved with those that would be generated by the developers' guidelines.

The district could obtain this information easily, however, simply by implementing the approach in a few schools according to the developer's guidelines. That's what the litmus test is—a controlled, carefully monitored, small-scale test of possibly effective approaches. Unfortunately, there is nothing to suggest that urban districts are capable of implementing any effective approach with fidelity. That may be why they haven't discovered what works well.

It's important to find out if urban districts can pass the litmus test because if they can't, they should be reconstituted so they have the capacity to implement with fidelity, and so they base their decisions on outcomes of small-scale tests of well-implemented approaches.

For instance, Chicago has implemented Direct Instruction its way with some low-performing schools, but is now discarding DI. A CPS researcher, Dr. David Bill Quinn, observed that other reported studies on DI have much larger effect sizes than Chicago achieved. He recommended either investigating why this discrepancy occurred, or dropping the program.

Apparently Chicago did not consider the first possibility. Instead it has adopted a new initiative, 100 small Renaissance Schools for at-risk children. Although this is an elaborate commitment, Chicago claims that it is not like the man and his bike, and that it is has comparative data on the effectiveness of small schools. It cites Harlem as an example of how improvement occurred in small schools. The problem is that this data is confounded by the fact that the demography of the district achieving this improvement changed a great deal and may have been the primary cause of the improvement. When the district superintendent from Harlem tried the same approach in San Diego (fuzzy math and whole language), it was not successful.

Even though the Direct Instruction implementation in Chicago is not consistent with the developers' guidelines, some DI project schools are comparatively successful. For instance, Woodlawn (which achieved the largest gain in the district between 2003 and 2004) went from 40% of children passing the Illinois State Reading Test in 1999 to 62% in 2004. Carver Primary went from 29% in 1997 to 44% in 2004. Three points of interest about Carver: (1) It has over 1000 children in K through 3; (2) it reached its highest passing percentage (49%) in 2000 after the National Institute for Direct Instruction (NIFDI) had been an external sponsor working with the school; (3) NIFDI dropped Carver and the other schools it worked with in Chicago because district standards and practices made it impossible to fully implement the model in any of these schools. So at least in NIFDI's opinion, Carver could have done a lot better.

Chicago hopes to revitalize the school with non-AFT teachers and a learning model called "Pathways," which claims to develop personal interest in fine arts, technology, math and science, journalism, and world culture. There's no compelling data to suggest this approach will work but (as Chicago apparently reasons) there's no data to show it won't.

Maybe Chicago is right and the small school will prove to be magic. After all, the best performing DI school in Baltimore, City Springs, is a small school.

Or possibly, small schools for at-risk populations are extremely hard to implement. For City Springs, its small size created a host of implementation problems. If a couple of teachers were absent, it was very difficult to cover for them with trained teachers or aides, much harder than it is in a larger school like Carver (which has many more classrooms on each grade level).

Also, in a well-implemented small school, grouping students homogeneously for instruction becomes a nightmare in grades 4 and above. The reason is that a large percentage of incoming students perform lower than any continuing students in these grades. Accommodating new entrants is far more challenging than it is in a larger school, which may have more than one classroom per grade dedicated to accommodate incoming low performers. A small school with more than 25% annual turnover (like City Springs) often has to penalize the continuing students by slowing their performance so the classroom is able to accommodate low-performing incoming students.

Other small-school problems include training and deploying coaches, accommodating students who have been absent for a while, and training teachers who are performing unacceptably. Possibly Chicago has strategies for addressing these problems. Or possibly Chicago will learn about these problems only after the 100 small schools have been in operation a while.

But even if "Pathways" has the potential to work well and the management has super smart strategies for implementation and training, what evidence is there that the district could implement the approach in a way that would achieve its potential? The answer is revealed by the litmus test. If districts don't have the machinery needed to implement on a small scale, there is no reason to believe that they'll be able to faithfully implement anything that is instructionally sound on a larger scale.

The format for the litmus test is a parallel to how the man with the bike could secure information about the bike's potential. Instead of ordering one bike, the man orders two, assembles one according to the book and one his way. Now he has a strict basis for comparison and is able to evaluate which practice is best. For the litmus test, the district identifies four models. Two would have substantial data of effectiveness (such as DI) and would be implemented the developers' way. Two would be approaches that the district prefers (such as the Renaissance Schools) and would be implemented the district's way. The performance of the effective models would serve as the yardstick for evaluating the other two models.

Each approach would be implemented for three years in three comparable at-risk schools (12 schools total). The district would necessarily waive whatever policies and practices interfere with the effective models being implemented according to the developers' specifications (given that these are legal, humane, and feasible). The district's standards, preservice and inservice training, and other practices may be in conflict with the models being tested. Waiving standards and procedures should not be a serious barrier. After all, the schools failed even though they followed these standards and practices.

Certainly, the effective models would have to stay within reasonable budget limitations and could not demand things like unusually gifted teachers or three aides in every classroom. A reasonable demand, however, would be that the principals are to be directed to follow the model's provisions. (This did not happen with DI in Chicago.)

The evaluation of the district's performance on the litmus test would be based on both the degree to which each model was implemented, and how well the schools performed. The evaluation of implementation fidelity would be performed by an independent agency and use objective measures—how well teachers and principals follow specified procedures and schedules. Students would also be tested on standardized achievement tests and state tests (with the testing being scrupulously monitored) but the results would not be released to the school or the administration until the three-year litmus test was completed. This provision assures that judgments of how well a school is implemented would not be biased by how well students performed on these tests.

The litmus test would provide at least five benefits for the district.

1. It would generate accurate comparative data about what works well, and about the relative cost of various approaches.

2. It would reveal modifications in the district's infrastructure that are needed to empower the district with the capacity to implement approaches faithfully.

3. It would save millions of dollars on large-scale implementations of approaches that would not produce worthwhile performance gains in a small-scale test.

4. It would protect large numbers of students from being subjected to shoddy instruction by limiting the number of students used in the "experimental" test of new approaches.

5. It would establish a basic professional standard for the district, which is that nothing is adopted until it demonstrates its worth in a small-scale, carefully monitored study.

The litmus test is not only the scientific or logical way for districts to discover how effective various approaches are and what's wrong with the district practices. The litmus test is also what any smart business would do—make prototypes and test them rather than launch into full-scale production without having any solid performance information about the product or how to use it effectively.

Ironically, some urban school districts claim that they fashion themselves after hard-nosed business practices. They have CEOs instead of superintendents, and their rationale for doing things makes reference to business. For instance, Chicago's CEO, Arne Duncan, contends that Chicago's new direction is consistent with sound business practices. Unfortunately, Chicago's practices seem to be no more sound than those used by the man who assembled the bike his way. To date, not one large urban district has taken anything like the litmus test, but if a district really is based on sound business practices, the litmus test would be one of the district's top five priorities.

Litmus Test for Urban School Districts

by Siegfried Engelmann July 2005

by Siegfried Engelmann
July 2005