Timing: Evaluation’s Dirty Little Secret

Apologies in advance for the slightly sensational headline. I’m trying not to let these blog posts get mired too deeply in the earnest and the technical.

What I’m about to tell you isn’t exactly news, and it is far from salacious. Yet, it is hugely important for thinking about pretty much any change you ever hear or read about. These can be changes in a patient’s health after starting a new medication; the sprouting and blossoming of an iris in your garden in the spring; changes in a baseball player’s batting average after he adjusts his swing; and changes in someone’s earnings after they get a college degree. It applies to virtually anything that involves cause and effect. Open a newspaper, listen to the radio, pay attention to what people talk about at work or over dinner. Most of the time, it comes down to a story of ”this happened” or ”so-and-so did this,” and as a result, x, y, and z occurred.

So what is the dirty secret with timing? It is this: your result will be heavily influenced by the point in time your study is conducted. Bear with me for a minute while I fetch my watercolors and brush, and paint you a fairly typical scenario from the development aid world.

Let’s imagine that, several years ago, a big project to help improve people’s lives was implemented. A new agricultural technique to grow vegetables was introduced in the country of Ruritania. Now the government who funded this project wants to know – how much bigger was the yield as a result? Was the money well-spent? Are people better off?

Now imagine that, two years later, a world class team of researchers has been assembled. They employ the most rigorous research design and methodological skills ever developed. Money is not an issue (This is my super, best-case scenario). They design and implement an evaluation that asks the right questions, surveys the right people, and is statistically rigorous in all aspects. It includes qualitative research to understand why and how those changes are or are not occurring. The supervision of the field research and quality control were outstanding. The results have been checked and rechecked for accuracy. It was, in short, a legendary evaluation! The outcome – drum roll, please – was that average yields had increased by 43.5% after two years.

Is that not a terrific result? Yes, most certainly!  Our donor is thrilled. But the truth, folks, is that 43.5% is a totally arbitrary figure. Why? Because of the timing. That’s our secret. That lovely figure is culled from just one point in time. Come back in two years, apply the same methods, and you will get another answer. I guarantee it. Maybe come back in 20 years, and you’ll find that average yields had settled in at between 20 and 30% higher than they were before. Or maybe they’re back to square one, perhaps because of soil depletion (We won’t even get into the 500 external factors which influence yields).

You see, every development follows its own trajectory, and the point at which you take the measurement matters hugely. I owe this insight to Michael Woolcock of the World Bank, who emphasized its importance in a seminar I attended a few years ago on qualitative methods and the limits of quantifying change.

Change trajectories follow their own patterns. Some start slowly and then accelerate.  Think of the college graduate and her earnings. After graduation, she works for months in a fast-food restaurant, quits out of frustration and is jobless, but then a year later, bingo! she lands a high-paying job in Silicon Valley.  (If you had measured the impact of her college degree after 6 months, it would have been disappointing, but if you measured the impact after one year, it would have been very positive!) On the other hand, some developments start rapidly, but eventually fade away. Think of the flower that blooms so prettily in spring, but eventually wilts, as she must. Some changes exhibit modest, steady improvement, and then level off – think of the patient who, after taking the pills, feels a bit better  each day, and after a week is back to normal from a full recovery.

This subject is one over several touched on in a fascinating podcast on Freakonomics Radio  that describes the results of research conducted by Raj Chetty and colleagues on a 1990’s program, “Moving to Opportunity,” in which poor families in poor neighborhoods were given the chance to move elsewhere. The initial results, in terms of changes in earnings after a 10-year time lapse, were disappointing.  But when they repeated the study after 10 more years, they found that the very young children in those families, who were now of working age, were seeing significant differences. It was a matter of timing (And realizing that the main beneficiaries, at least in monetary terms, were the toddlers).

Photo credit: Phaedra Wilkinson

What I’ve described above is not an argument against doing evaluations, or an argument against trying to measure impacts as accurately as you can. It is a call to take into account the nature of our world. It is extremely rare for anything to change in a constant, upward direction.

So, what does this mean for evaluations? For users of evaluation research, it means, don’t take those scientific precise results you get as immutable. They tell about the magnitude of a change at a specific point in time. If sustainability is an important issue (and usually it is) you should do a follow-up evaluation at regular intervals. If that sounds expensive, maybe don’t spend all your funds on one evaluation that will give you precise but chronologically arbitrary results.

For us evaluators, it means that we need to talk about the results we obtain in a way that values accuracy and rigor, yet doesn’t fetishize precision. We need to take into account the limitations while also probing more deeply into how those changes are occurring. When conducting research, we must consider the trajectory of a change. We should take a deeper interest in the chronological context by asking questions about how the changes have come about: how quickly or slowly? What do people experiencing or observing the changes expect in the future?  This may not be a scientific method, but I would argue it embodies the scientific spirit by asking important questions of how a change transpired, and what path it took.  It means recognizing the ephemeral. It means accepting that, in most cases, we grasp at answers at a moment in time, and tomorrow those answers might be completely different.


Making the case for credibility

In an op-ed for the Washington Post on February 3, discussing journalism, Ted Koppel wrote that “we are already knee deep in an environment that permits, indeed encourages, the viral distribution of pure nonsense.” What is disconcerting is that many people may not care, as long as the nonsense aligns with their worldview.

Take note, evaluators, and anyone for whom collecting evidence is important. If until now the critical issue was ensuring your evidence was credible, henceforth the challenge may be convincing others that credibility even matters. We have entered an era in which information has gone from being something more or less firm, to one in where it is going to be fluid.

The term ‘post-truth’ was selected by Oxford Dictionaries as the word of the year for 2016, defined as “Relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.” While Oxford Dictionaries has highlighted a serious problem with its selection, I think it would be more accurate to call post-truth the euphemism of the year. Post-truth just smells Orwellian; its academic-sounding preface adds a gloss of respectability to an insidious practice. Post-truth is not related to truth in the way post-modernism is related to modernism. The term hardly deserves to be dignified. A better description is ‘anti-truth.’ This would more accurately and honestly convey what happens when half-truths and falsehoods are spread, aimed at degrading the consensus on reality and contaminating public discourse.

Yes, there are grey areas when it comes to information. It can have multiple meanings. You can argue opposing sides by marshaling selective facts to make your case. (Lawyers are trained to do this.) Thus, it is accurate to note that under the Obama administration (January 2009 to January 2017) unemployment fell from 7.8 to 4.8 percent, which is a good thing. But you can also point to a fall in labor force participation rates from 65.7 to 62.9 percent. Not such a good thing. But in order to have a meaningful argument rather than, say, a shouting match, the basic facts must be accepted, and accessible, to all. If one side says, we don’t trust the US Bureau of Labor Statistics (where these data come from), they’re just a bunch of liars, then there is no basis for conversation.

What seems to be occurring is that one side has become increasingly less interested in engaging in a meaningful argument and is happy to make stuff up, i.e. invent facts. And when credible evidence is produced, it is now often derided as false. For example, the controversy over Obama’s birth certificate: although the certificate was made available in 2011, as of 2016, 41 percent of Republicans disagreed with the statement in a NBC News|SurveyMonkey poll that “Barack Obama was born in the United States.”

Will the skepticism of credible sources filter down to the technical research work conducted in the social sciences? Let us hope not, although the new Administration’s gag order on scientists in federal agencies is not encouraging. We may have to confront a whole new dilemma. No longer will it be sufficient to provide credible evidence, transparency of methodology, and detailed information on sources. We may need to defend the very concept of credibility, make a case why credibility matters to those who disagree.