Why comparing your outcomes to community averages might be misleading

I followed a Chronicle of Philanthropy chat titled How to Show Donors Your Programs Are Working earlier this week. While it is encouraging that the social sector is trying to incorporate metrics in our work, data’s rise to mind share prominence has also seen the rise of some fairly dubious advice.

One piece of advice from this “expert chat” was that organizations should couch their outcomes in terms of community averages. For example, a tutoring program might look at the graduation rates of students in their program versus graduation rates for the school district at large in order to show their students do on average perform better.

I’ve heard this suggestion a lot – and see organizations proudly declare their outcomes are some percentage greater than the community’s as a whole.

The problem with this approach, and pretty much every mainstream discussion about evaluation, is that there is no serious discussion about the difference between a good and a bad comparison group.

The missing counterfactual

In the evaluation literature, a counterfactual is a hypothetical whereby we try to estimate what would have happened to someone in our program had that very person not received our services.

This is pretty tough to do – and the reason evaluation experts prefer randomization is that it gives a good approximation of the missing counterfactual. That is, randomization allows us to take two people we have no reason to believe are different, and provide one of those people the program while the other does not receive the program, and then estimate the difference in their outcomes as the program’s impact.

The suggestion to use the community as a whole as a comparison group assumes that the people in your program are the same as the people in the community at large with the exception of your services. This is a pretty bold claim.

Let’s go back to our tutoring example. A skeptic like myself might argue that people who choose to participate in a tutoring program are more motivated to graduate high school than the average student. In this way, it’s hard to differentiate if your program actually made students better able to graduate or if the students in your program were just so highly motivated that they were likely to graduate anyway.

When we compare the kids in our tutoring program to kids at large, we might be comparing a highly motivated student to a particularly unmotivated student. This is not a fair comparison to make.

Yet we make these comparisons all the time when we blindly compare our outcomes to community averages.

Comparing our outcomes to community averages might be effective from a fundraising standpoint, which was the premise of the Chronicle of Philanthropy talk. But I would argue this particular approach has less to do with “showing your donors your programs are working” and more to do with identifying favorable comparisons that make your outcomes look good.

In so far as evaluation is more about truth than treasure, simple comparisons of outcomes to the community average can be highly misleading.