If you had to choose between an evaluator or a marketer

Earlier this week I asked the following question on Twitter

If you had to choose between an evaluator or a marketer for your #nonprofit org, which would you pick and why?

I had plenty of interest in the question, but only one answer. Ann Emery pointed to a 2010 study by the Innovation Network about the State of Evaluation in 2010. The salient point from the report that Ann pointed out is that in an online survey of over 800 non-profit organizations across the United States, “fundraising is #1 priority in nonprofits while evaluation is #9…”.

This statistic provides some support for the open secret that organizations prefer investing in marketing over evaluation, but it doesn’t answer the second part of my original question, why do organizations choose marketing over evaluation?

While not intentionally an answer to this question, a nonprofit I have been almost working with for the last year provides some insight. This organization is relatively young, small nonprofit that has found itself a media darling in certain circles. It plays lip service to a desire to evaluate its findings but insists that anyone who look at their numbers be properly vetted (read already a believer in this agency’s approach).

Evaluators are inquisitive, and skeptical, by nature. A hypothesis test assumes there is no effect, rejecting this assumption only in the face of convincing evidence. Evaluators do the same thing.

This organization (not-uniquely) starts from the standpoint that its impact is a given. In that mindset, evaluators can only disprove your asserted greatness. Thinking of it that way, I’m not sure I’d hire an evaluator either.

An investment in marketing however brings accolades from the press, photo ops with politicians and the adoration (and financial support) of the general public. So really a choice between marketing and evaluation is a choice between fame and fortune versus the possibility of uncovering that the project you have invested in for over half a decade doesn’t do what you thought it did.

In this way, choosing the marketing consultant is the only rational choice to make. Well, that is, if your organization’s logic model defines an ultimate goal of self aggrandizement. If instead your target population are the people your agencies aims to serve, and the impact theory defines causal linkages between your interventions and something other than coverage in the New York Times, then an evaluator might be an okay idea after all.

Consumer protections and false advertising

Corporations invest in evaluating their products in part because better products are more competitive in the market place, but given the indirect funding nature of nonprofits, where the service recipient is not the purchaser of services, this incentive falls apart.

However, corporations also evaluate their products as to not run afoul of the various consumer protection regulations placed on businesses, including laws against falsely advertising one’s products effects.

Imagine how different the sector would look if a similar standard were applied in the social sector. I have written before that evaluation brings truth in advertising to the social sector, but the real benefit would be to those we serve. The media story should be a secondary bi-product of a job well done. Instead, getting a good media story is the job, period.

Reporting benefits and harms

When I was in graduate school I had a fellowship that placed me with a community development financial intermediary. The organization, like most agencies in the social sector, was interested in demonstrating the effectiveness of its interventions.

I asked the executive director whether she wanted me to try to figure out what impact their work was having, or if she simply wanted me to report positive outcomes. Depending on how you look at a spreadsheet, you can make any gloomy picture look like rock-star results. To her credit, the executive director asked that I try to develop a real picture, not simply a rosy one.

But most of the pictures painted on organizations’ “Impact” sections of their websites is of the Photoshop variety. There is a lot that is wrong with the way outcomes are reported, and conflated with impact in social sector communications. One problem that I consistently see is reporting positive outcomes while neglecting to report the changes in those that experienced worse outcomes.

For example, the website of a celebrated family focused startup non-profit boasts that 20% of children in their program increased school attendance. Sounds great. But what happened to the other 80%? Did their attendance stay the same, did it get worse? And if so, by how much?

Increases always sound nice, but does any increase always outweigh a decrease? If 20% of students improved school attendance and 80% attended school less would we still consider the program a success?

Well, we probably need some more information to answer this question, information which is never provided in this type of outcomes advertising. We would at the very least need to know what an increase or decrease in attendance means. A 20% increase students attending classes sounds great, but if a kid was missing 30 days of school a year and now she misses 29, is the gain really as meaningful as it first sounded?

More importantly, what would have happened to these kids without the program? We need an estimate of their counterfactual (what the attendance of these youth would have been without this program’s intervention) in order to truly determine whether we think this increase is reason to celebrate (or the possible decrease for a portion of the other 80% is cause for alarm).

Ultimately this comes down to a question of what I call reporting believability. Most of the impact results I have seen on organizations’ websites are simply not believable, as they tend to claim outrages and unsubstantiated gains.

But these unsubstantiated claims of impact are big business. And if the social sector wants to truly move toward evidenced based programming, we need to figure out how to make it more profitable for organizations to report credible data instead of fantastical folly.

Too many indicators means a whole lot of nothing

Organizations have a tendency to want to collect every data point under the sun. I cannot tell you how many agencies I have contracted with that aim to collect things like social security numbers and criminal histories when these data points carry no decision relevance, and don’t factor anywhere into the services they offer.

Even if organization executives are not concerned with putting those they serve through exhaustive questionnaires, they should be concerned about how overburdening front-line staff with administering lengthy intakes decreases data integrity. I have long advised my customers to keep their survey instruments short and to the point. The shorter your intake, the more likely you are to have every question answered. And if you are only asking a limited number of questions, every question should have been well thought out and be clearly relevant to decision making.

I’m in the process of working with some organizations to redesign their intake forms. One organization I’m working with was attempting to track over 300 indicators. Back in the original intake design phase the thinking was (as is common) that the more data you have the better. In hindsight, my customer realized that trying to collect so many indicators overlooked the implementation reality; it’s a lot easier to say what you want than to go out and get it.

The following histogram shows the number of questions on the y-axis by the number of times those questions were answered on the x-axis over a year for this particular organization. Half of the questions were answered about ten times, with one-third of questions never being used.

To be clear, this is not a case where the front-line staff was not collecting any data at all. There were a handful of questions with around 3,000 answers, and a reasonable number between 500 and 1,500 answers. The questions with the most answers were indicators that every front-line staffer found important, such as race and sex. The reason the question answers varies so greatly is that with so many questions to answer, no staffer was going to answer them all. Therefore, each staff person used her or his own judgment as to which questions were important to answer.

With so many holes in this data set, it’s hard to draw much insight. To avoid running into this problem, organizations should tie each question directly to an outcome in their impact theories. This discipline helps prevent “question-creep”, where new questions are asked out of curiosity rather than what actions can be taken with that feedback. Second, get front-line staff involved in the intake design process to ensure that all the data they need is being collected and that the questions, as worded, are practical and collectable.