The trouble with benchmarks

I just got back from the annual Independent Sector conference, which brings together non-profits, foundations, and self-promoting consultants (sorry about that) to discuss the direction of philanthropy.

The theme of the conference echoed that of the online philanthro-sphere, namely, most every session and discussion had something to do with data. I had some nice chats with folks about the emerging Markets for Good platform, attended a session on using communal indicators to drive collective impact, and heard one too many pitches about how this or that consulting firm had the evaluation game pretty much locked up.

What many of these conversations had in common was a focus on setting benchmarks to compare progress against. Benchmarking is a quick and dirty tool for trying to estimate an effect over time, but it can be misleading, a fact that was not discussed in any of the sessions I attended.

Benchmarks essentially require one to measure an indicator level at an initial point in time, using that initial measure as a baseline for the future.

For example, a workforce development program might measure the percentage of people in its programs who found work in the last year, using that percentage as a baseline to compare future employment rates against. A year later, the program would look at this year’s employment rate and compare it against last year’s benchmarks.

In this simplistic scenario, one might assume that if the employment rate is better this year than last year’s baseline, then the program is doing better, and if this year’s employment rate is below the baseline then it is doing worse. But, as the title of this post gives away, there are some things to consider when using baselines.

As I have written in the past, social sciences are particularly complex because there are so many external factors outside our program interventions that affect the lives of those we aim to serve. In the employment baseline example, a worsening economy is likely to have a larger effect than the employment services themselves, all but assuring that the next year’s employment rate will be below the baseline, even if the program was more effective in its second year.

Under the collective impact benchmarking model, we would collectively flog ourselves for results outside of our control. Likewise, we can also see the opposite effect, whereby we celebrate better outcomes against a previous benchmark when the upward swing is not attributable to our own efforts.

So, is benchmarking useless? No, but it also should not be confused with impact. A mantra I preach to my customers and brought up in many conversations at the Independent Sector conference is that it is equally important to understand what your data does say, as well as to understand what it does not.

The simple difference in two outcomes from time A to time B is not necessarily program impact, and cannot necessarily be attributed to our awesomeness.

Benchmarking tells us if an outcome is higher or lower than it was in the previous period. But subtraction is not really analysis. The analysis is trying to tease out the “why”. Why did an outcome go up or down. Was it the result of something we did or some external factors? If the change is attributable to external factors, what are those factors?

In short, benchmarks can help you figure out which questions to ask, but benchmarking itself does not provide many answers.

I’m encouraged by the buzz around using data and analytics, but am cautious that data is kind of like fire, and it’s important that we know what we are doing with it, lest we set ourselves ablaze.