Evaluation | Evaluate This

Being there: the value of going to the field

Sometimes, at the tail end of another 20 or 30-hour trip, say, to conduct fieldwork for an evaluation, and after crossing as many 11 time zones and spending two nights on airplanes, I ask myself – why? What is the point of traveling, 7,000 miles? Couldn’t we have just had a few phone calls? In this day and age of instant communication, skype and other video chat applications, is it really necessary to be physically present for a meeting? Just think about the time and money spent on business travel. The Global Business Travel Association estimates that global spending on business travel topped $1 trillion about 5 years ago and has been expanding ever since. That amount represents, give or take, 1% of the entire global economy. That is a lot – roughly the size of the Mexico’s GDP. Basically, you can think of it as the marginal value, to the public and private sectors, of face-to-face meetings vs. phone, skype or videoconferencing.

I was stimulated to write this blog post by the latest issue of New Directions in Evaluation (Winter 2017). The entire volume, edited by Randi Nelson and Denise Roseland is devoted to field visits – how and why they are a key element of conducting evaluations, among other things – a somewhat neglected issue in the evaluation literature. The discussions and articles have caused me to reflect a bit on the importance of going to the field (i.e. being physically present in the area where the program is being implemented) in my own work in international development.

There are two main aspects to field visits, which in the field of evaluation, are defined as going to the place where the program or project is being implemented. There might not even be any physical thing to see, such as a new school, equipment, newly installed energy efficient boilers, or what have you. You need to be there to meet with people in their own environment. And long trips and jet lag are actually the least of it. Field trips within countries can involve long car trips to remote rural areas along terrible roads in countries where road safety is somewhat of an afterthought, if not a downright inconvenience. (Auto fatalities pose a real risk to international development professionals. A few years ago I was involved in an incident in which the car I was driving in spun and flipped over on a country road in southern Malawi. I was lucky. I escaped with minor bruises. (Use that seat belt, folks.)

Yet, if you ask me whether I’d rather to do all my analysis from the comfort of my own home, based only on phone interviews or a desk review, I would respond…well, sure, but don’t expect the same depth of analysis or the same quality of information. It is much easier to get a sense of a project when meeting people – implementers, beneficiaries, other stakeholders. Doing a focus group by phone or video? Forget it. Take a look at the photo below, from a field trip I was on in Tajikistan in 2011. Imagine talking to those people from the village by video-conference.

There is also a very practical reason for preferring face to face meetings. When you can see the other people in the room, you can avoid the awkward pauses and people speaking over each other.

Conducting many types of analysis is not just about collecting information. A key factor in getting people to open up to you, and reveal their thoughts, is trust. And when you have never met the person, you can’t look them in the eye, you can’t read their body language. When people get a measure of you (the evaluator, researcher, entrepreneur, etc.) as a person, they tend to talk more. Until you see someone in the flesh, both of you are quite literally disembodied, and not quite real. Who hasn’t in the online dating scene hasn’t had such an experience? Before you met the person they may have seemed terrific, almost perfect, but once you meet them, you have a completely different (and often disappointing) impression? The digital world, the world of two dimensions still doesn’t hold a candle to the world of flesh and blood! Whether you’re evaluating a project or a potential mate, you gotta get yourself up off the sofa, and get out there. For whatever reason, being physically present is a big step in getting closer to that ever elusive truth.

Nils Junge January 31, 2018

Evaluating New Year’s Resolutions

It’s New Year’s eve! Ready to take the plunge into 2018?

I find that the cold and inclement weather and long nights in the northern hemisphere are especially conducive to reflection and inner work. Many people use the holidays to mull over the past year. I like to set aside some time to review what it was all about (if anyone can explain what happened in 2017, please let me know), and figure out what to do differently, starting with the metaphorical clean slate we all get on January 1.

You see, around this time of year, many of us turn into amateur evaluators. When making New Year’s resolutions we are essentially evaluating ourselves, are we not? It’s safe to say that you don’t make a resolution unless you’re dissatisfied with something. So, we reflect on the past year: what went well, what didn’t go so well, what was achieved, what wasn’t. And based on this self-evaluation, we think about what we want to do better, or differently, next year. That’s similar to what evaluators do when they take on a new assignment. Only in this case, the focus is more personal. And you don’t get paid for it. Which may explain why so many of us fail.

But resolutions are difficult to stick with. According to surveys, while almost half of Americans make them, less than one in ten keep them. I freely admit that I often fall short. But now and then I succeed in sticking with something. This is what I have found works for me:

Self-monitoring is another way of changing. Evaluators rely on data, and time-series data is an excellent source of measuring change over time. To measure something is the first step. The management thinker Peter Drucker reportedly said that to improve something it must first be measured. I have found that just tracking something gets me interested in doing more of it (or less of it, if it’s a bad habit). Many people track their spending as a way of managing their family finances. I’ve kept a list of every book I’ve read since 1986. I also keep a log of how many hours I work. Monitoring equals awareness and awareness forces one to decide how much to value a particular activity, and whether to increase, decrease or change something about it.

Divide and conquer! If your resolution involves a major project, divide it into smaller tasks and take them down one by one. Otherwise, it can all seem overwhelming. The temptation to throw in the towel and say ‘the heck with it’ may be hard to resist. Take your big fat goal, and sit down and make a list of all the steps to reach it. This might involve collecting information (how to fix that leaky ceiling or furnace, where to take language classes), talking to people, getting the tools for the job, setting aside time. The steps should be manageable and not too difficult or time-consuming. Then, one by one, go through your list, checking off the tasks as you complete them. I do this when I have to write reports,

Seek help. It is easier to accomplish something if you have support. It can come from a mentor, a coach or teacher, even a friend or colleague if you share similar goals. I took up the piano in my 40s after a 20 year hiatus, and found I could still play and enjoy it. But I wanted to play better, so I found a piano teacher.

Reward yourself. Incentives, if they’re designed properly, tend to work. Now while it may be difficult to find someone to pay you for achieving your goals (and good for you if you can!) perhaps you have some guilty pleasures (assuming you’re not a robot) that can serve instead. Such as things you like to eat, buy or experience but feel you shouldn’t indulge in that often. Why not make a deal with yourself? A reward for certain milestones on the path to achieving your resolution. ,

Develop some low-effort habits. If your resolution is about changing something in your life – maybe learning about a new area, developing a new skill, or reducing your screen time – try making a small adjustment to your routine. For example, you might set aside 10 minutes in the morning for working on a new language, for meditation, for exercising. Or you might commit to not checking your email except at prescribed times during the day (recently recommended by the Tim Harford of the Financial Times). If the change is not an onerous one, very quickly it may become a habit. You have to figure out what the best conditions are. In 1997 I began keeping a journal. Over two decades later, I still spend 10-20 minutes writing every morning, after shaving and before eating breakfast. I found that the trick was to get the writing over in the morning, not in the evening, by which time my cognitive resources are depleted and I don’t want to have any more obligations.

None of the above may work for you – I myself probably fail more often than I succeed – but it doesn’t hurt to try. Something might stick. Remember, the bottom line is to take whatever resolution you come up with, cut it up and organize it into chunks that are small and easy to digest.

As to my own resolutions for 2018? Post more blogs, for one.

Happy New Year and good luck with achieving those goals!

Edited December 30, 2020

Nils Junge December 31, 2017

The temptation of conflict vs. balance in reporting

The title of this blog is probably a bit abstruse, and not quite as dramatic as King Kong vs. Godzilla. Bear with me while I explain.

In writing, there is a temptation to focus on problems, on challenges on things that aren’t working. These are all conflicts of one sort or another and conflict is grist for the mill. Conflict grabs our attention. Why else do we exchange gossip, follow sports, watch the news, read history, go to the theater, and slow down to gawk at car accidents? Every one of these “leisure” activities revolves around conflict of one sort or another.

Fundamentally, conflict is one of the cornerstones of the dramatic arts – drama, tragedy, and yes, even comedy. “Conflict creates comedy”, as the HBO show Curb Your Enthusiasm’s Jeff Garlin summed it up. (On a side note, I spent my twenties pursuing an acting career. I moved on when the conflict between the dream and the realization that I wasn’t good enough became too severe. I have no regrets.) I would go so far as to argue that it is human nature to revel in conflicts, their escalation and resolution.

However, for most people conflict is something to avoid in your own life, the current US President being a stark exception. We prefer to watch from the sidelines, from the safety of our living rooms and mobile devices.

This theme can obviously be taken in all kinds of directions. In line with this blog’s main focus, I’m going to talk about evaluation. The ‘temptation of conflict’, by which I mean the temptation to focus on conflict, can in fact creep into evaluation work. There is, always potential for tension to arise between the evaluator and the evaluated. The former is looking for what works and what doesn’t, and there is always something that doesn’t in this complex world. It can take experience and diplomacy to put a client at ease, to avoid losing their cooperation, and help everybody get the most out of the evaluation.

The temptation to focus on conflict can also come into play during the writing process. In my own experience, writing up an evaluation report is generally easier when a project or program has gone badly rather than when it has mostly gone swimmingly. There is more to say, more to analyze, and more to recommend. If we consider the term conflict to encompass disconnects, tensions, misunderstandings, misinterpretations, then it not difficult to uncover conflicts between what was planned and what was implemented, between stakeholders, between donors and government, between expectations and reality and on and on.

Thus, I have found that when conducting a project evaluation , there is a natural tendency to zero in on what doesn’t work. If you are so unlucky as to review a project that is splendid in every way, and have nothing to say in your report, whoever commissioned the evaluation will think you didn’t do a thorough job.

While projects should indeed be analyzed and accompanied by well thought out recommendations, the final evaluation report needs to strike the right balance. For example, If the overall assessment of a project is, let’s say, moderately satisfactory, the substance and tone of the report should reflect that.

I was once on a team evaluating a US government development project. Multiple agencies were involved in providing technical assistance to countries in three different continents, and it the coordination they achieved was quite remarkable. We, the evaluation team, agreed that it was an example of an effective, well designed and well implemented project. We noted numerous accomplishments, along with a few weaknesses. However, the tenor of our report, which methodically addressed each question in turn, was rather negative (As is customary the client who commissions an evaluation provides the questions they’d like evaluators to answer). For each question, we were able to identify some weakness, even while acknowledging the achievements. No one is perfect, after all. Upon reading the report, the project implementers were somewhat bemused. They, justifiably, expressed concerns that we had made the project sound, well, kind of mediocre. This was not our intention. Simply put, there was a lot more to say about the problems (i.e. the conflicts), even if they were modest, than about what had gone well! There was more to analyze and there were more factors to unpack for the negatives than for the positives. We did cover the good stuff, but not at the same level of depth. There were fewer issues to dig into, and not many recommendations linked to them.

One lesson I’ve taken from this and other evaluations is that you must make an effort to balance positives and negatives. While problems (and remedies) should be highlighted, the overall tone of the report should reflect the overall assessment of the project. If you find a few twisted or dead trees, do not portray the whole forest as if it were damaged. Of course, you should not neglect or avoid detailing any negative aspects in your findings. But don’t give them more importance – or less, of course – than is warranted. Be objective, rely on evidence, and be fair.

Nils Junge October 31, 2017

Check your outlier – is it a symptom or an anomaly?

The shocking United Airlines ejection of a passenger was an outlier

This week I’d like to talk about outliers. These are people, events, or data points that are so far from the norm that they attract unusual amounts of attention. Outliers make the news. Most of the news stories you read concern exceptions or unexpected events. They grab out attention. The disturbing United Airlines’ scene of a paying passenger being forcibly and violently ejected from the plane represents just such an outlier.

Last week, a passenger was brutally dragged off a United Airlines flight in Chicago by security guards who broke his teeth and nose, leaving him with blood streaming down his face and a concussion. He had refused to give up a seat he had paid for, and United decided that he and three other passengers (chosen randomly according to some algorithm) had to leave the plane. The airline had offered travel vouchers (of $800) but no one had taken them, apparently. So it decided – time to draw straws. And what for? So space could be made for four airline employees, arriving at the last minute for the flight bound for Louisville, Kentucky.

The scene of security personnel pulling the 69-year old Asian American (he was born in Vietnam) through the aisle to the horrified looks and gasps of other passengers was filmed on smartphones and duly posted to the web. It has generated a huge outcry and calls for a boycott of United. The company made things worse when CEO Oscar Munoz issued a terse non-apology, blaming the passenger, Dr. Dao, for being “disruptive” and “belligerent”. Munoz wrote: “I apologize for having to re-accommodate these customers,” which is about as far away from saying sorry as it gets before entering antonym territory.

Meanwhile, there are plenty of news articles predicting that the public will get over it, and United Airlines will weather the storm. This is because many companies have overcome this type of scandal in the past, and because many customers don’t have a lot of choice when it comes to airlines. Consolidation among the big legacy airlines, blessed by the regulators, has ensured that.

The shocking incident is highly unusual, which is why it generated so much attention. We are used to hearing about passengers being escorted from flights for misbehaving, but in this case it was the airline which misbehaved (even if it did outsource the job to airport security).

Outliers – good for the news, but not so good for research?

In the research world, outliers are not a news opportunity. Most of the time, they’re viewed as a problem, sticking out like a sore thumb. Outliers raise questions about data reliability and validity. They distort the mean, leaving an inaccurate impression, even if the data is 100% correct. (Remember the one about how Bill Gates walks into a bar and suddenly everyone is a billionaire, on average?) The solution? Outliers are typically ignored or dropped from datasets by researchers.

But outliers can mean different things. They can be symptomatic or anomalous. Maybe an outlier highlights a larger problem, represents the tip of the iceberg, a leading indicator, a canary in the coal mine, i.e. a symptom of some larger phenomenon or a trend that is about to break. Or maybe it represents an exception to the rule, a bad apple, and can justifiably be disregarded. It is also possible that the outlier is an error. Before deciding how to deal with them or react to them, we need to understand what they mean. If they’re signaling something, then even researchers need to take a closer look at them. Like a doctor, you should check your outlier in order to make a diagnosis.

In a 2002 article, Vijayendra Rao advocates “having tea with an outlier,” i.e. looking more closely at what they represent, and maybe even talking to them if they represent a survey respondent, to get a different perspective on the issues.

The outlier may have different intrinsic characteristics that sets it apart. I was once asked to find a poor person in a village in Kazakhstan in a region where I was conducting an evaluation for the Asian Development Bank. It wasn’t easy, because the people we talked to didn’t really consider themselves poor and had to scratch their heads when we asked them to point us toward a poor household. Finally, my team members and I were directed to the home of a single mother. We brought her some groceries, and knocked on her door. She invited us in and we sat down and talked to her. It turned out that she had emigrated from Ukraine (I can’t remember whether her husband had died or merely left her) and thus lacked a social support network. She had problems with her papers. There were also health issues. I don’t believe we actually had tea. She was an anomaly, an exception. She didn’t represent the typical inhabitants of the region. While we learned about what kind of factors might drive people into poverty, her case didn’t tell us much about poverty issues among the population as a whole.

The case for symptomatic

But if the outlier represents an extreme case of a phenomenon that is happening to a lesser degree elsewhere, then it takes on a different meaning. What do we have with United? I would argue that the case is symptomatic and not anomalous. Indeed, although it was well outside the norm (the chances of a passenger getting bumped from a flight remain 1 in 10,000, and the chance of losing your teeth in the process remain vanishingly small) the mood against United has been building for some time and helps explain the outrage. United certainly did not have a good customer service reputation prior, and the extreme mistreatment represents, in many people’s eyes, all that is wrong with the company. The frustration and anger over poor service boiled over. United’s reputation was already solidly second rate. It ranks 66 out of 100 global airlines according to one survey.

My own experience flying United is far from pleasant, and presumably widely shared. Anyone who has flown with a European, Gulf region or Asian airline will know that US carriers in general deliver poor service. While with the better international carriers you might feel as though you were their guest, on most American carriers you feel like a revenue source that, inconveniently, must be processed, takes up physical space, and requires (minimal) attention. They get away with it because of a lack of competition, and because they know passengers put up with it because of the relatively low prices.

The irony is that Americans on the whole don’t tend to be unfriendly; quite the opposite. But once hired by an airline, I can only assume that they are processed through some kind of training module which strips away as much of their humanity as possible (although you will occasionally interact with a friendly crew member or ground staff whom the system clearly failed to process).

Finally, CEO Munoz’s initial reaction to back his staff and essentially blame the passenger was very telling and fairly indicative of United Airlines’ attitudes in general. Based on his initial reaction for Munoz the incident was not such a big deal. In other words, it was within the bounds of normalcy. That suggests it was symptomatic, not anomalous. Granted, Munoz did issue a proper apology days later, but that was tainted by the strong suspicion that it was a reaction to the airline’s share price dropping a bit and not some sort of recognition that it’s approach to customers in general is woefully lacking in common decency.

So while the news articles which argue that United will survive this debacle may be correct, it doesn’t mean that this particular extreme behavior doesn’t mean anything. I believe the evidence supports the view that it reveals a lot. It is symptomatic of a much larger problem – in a word, disrespect toward passengers, those important but still annoying revenue streams.

(If you’re curious, I will do my best to avoid giving United my business in the future, even if it costs extra.)

Inadvertent airline humor

I leave you with some one liners. They are taken verbatim from a link on United Airlines’ own website called, without a trace of irony, Shared Purpose and Values:

We Fly Right: On the ground and in the air, we hold ourselves to the highest standards in safety and reliability. We earn trust by doing things the right way and delivering on our commitments every day.
We Fly Friendly: Warm and welcoming is who we are.
We Fly Together: As a united United, we respect every voice, communicate openly and honestly, make decisions with facts and empathy, and celebrate our journey together.
We Fly Above & Beyond: With an ambition to win, a commitment to excellence, and a passion for staying a step ahead, we are unmatched in our drive to be the best.

Even setting aside the passenger ejection incident, anyone who has ever flown on United – an airline with some of the most unhelpful and unfriendly employees in the world – will be forced to acknowledge that they do have a dry sense of humor.

Nils Junge April 18, 2017

Give me a number, any number

When interviewing people as part of an evaluation, at some point I like to put them on the spot.

The interviewees will be well-informed about the program under evaluation. That’s how they were selected. They might be policy makers, managers, program implementers, sector specialists or some other type of what we in the business call “key informants,” or “KIs.” The interviews are semi-structured, with a pre-determined set of questions or topics. That means the answers can be open-ended, in contrast to surveys where most responses need to be kept succinct. The open-ended format allows the interviewer to probe, follow-up or clarify particular points. It’s not so different from how a journalist interviews a subject, or police detective interviews a suspect. It’s a process of discovery as well as a matter of answering straightforward questions about program.

During the interview, as we go through the questions and the respondent shares her assessments and opinions (often in many shades of grey) I’ll press her to take a stand and defend her position. I’ll ask for a number. I want a number that summarizes, say, her subjective assessment of the program.

Sure, I like words. I’m a writer, after all. You can learn a lot from words, but there are just so darn many out there! In its ability to synthesize a story, a number can almost be poetic…

In the middle of a discussion about Program A, I’ll ask something like: “Now, based on everything you know about this energy development project, how would you rate its impact on a scale of 1 to 5, where 1 means you noticed no impact at all and 5 means you noticed a very strong impact?” The key informant will come up with a number, say a “4.” I make a note of it, and then follow up with something like “Please elaborate. Tell me why you rated it a 4?” And the interviewee builds a case, offering more details, providing a rationale.

Sometimes the response you get is a surprise. A key informant will be criticizing a program left and right, but then rate it 4 out of 5. Why the apparent disconnect? Apparently all those criticisms carried less weight for the respondent, in the grand scheme of things, than I had assumed. You see, if I hadn’t asked them to give me a number, I might well have walked away from the interview thinking, “Hmm, he thought that program sucked.” In fact, they thought the program was pretty good, but just had some caveats. You can find the opposite scenario as well, of course. A person speaks positively about a program, then rates it a 3. It could be they just had very high expectations.

These are scale questions, referred to as Likert-type scale survey questions after the psychologist who invented the concept, Rensis Likert. Of course, such rating questions are extremely common in surveys. Who hasn’t responded to an online or in-person interview which didn’t include a scale question? Online sites including Amazon, Netflix, TripAdvisor, and Yelp use a similar approach to get us to rate products or services.

The concept has something in common with the efficient market hypothesis, which states that share prices reflect all current available information. All the negatives, positives, and expectations are priced into that one number. Doctors use a pain scale to gauge a patient’s chronic pain. Therapists will ask their clients to rate their depression using a scale. Similarly, evaluators might use a scale to understand the degree of effectiveness of an intervention, or a number of issues.

Typically, the scale question is used for closed-ended interviews, as part of surveys. Responses can be analyzed and used to obtain an aggregate measure for all the 1,243 respondents, as well as different subgroups. For example, you might be find that, on average female participants rate the program’s ability to improve their lives at 4.3, while male participants rated it 3.8. Done well, this line of inquiry can be a valuable method for taking a population’s pulse on an issue.

The nice feature of open-ended interviews with key informants is that you are able to do a little digging after they’ve coughed up a number. In a survey, if you ask respondents to explain their answers, things can get complicated – those longish answers can’t be easily summarized numerically. (Of course, open-ended answers can be coded according to type, but then you lose a lot of that rich detail along the way.) You don’t face that constraint when conducting qualitative research. You face other constraints instead.

I find that rating questions are a good way of cutting through the dense jungle of information you can be pulled into when doing research. You’re walking along, making your observations of the flora and fauna, taking as much in as you can, using your machete to clear the way, trying to figure out whether you’re heading in the right direction (i.e. testing incoming information against your hypothesis, the path which may or may not lead you to the truth). And then you emerge onto a rocky outcropping…and all at once see the whole rainforest spread out below you. Aha! So that’s how the program looks.

Rather than being a reductive or sterile exercise, I’ve found that people being interviewed rather like this type of questioning. They appear to enjoy the exercise of mentally processing large amounts of information on a subject to generate out a single number. And they like explaining how they got there.

Essentially, this is a way to leverage people’s cognitive functions. You’re engaging them in a kind of meta-cognition exercise, in which they examine and explain their own thought processes.

Try it out on yourself. On a five-point scale, rate your satisfaction with, say the place you live; your own job performance…or how much of your life you spend online. Then justify that number in words. You will most likely find that your brain immediately begins sorting through a whole succession of factors, lining up the pros and cons, weighing them against each other.

It may be that I’ve spent far too much of my life evaluating stuff, but I honestly find this exercise quite revealing, stimulating even, in a cerebral sort of way.

Nils Junge April 11, 2017

Don’t shoot the messenger (or evaluator)

Congressional Budget Office logo

The non-partisan Congressional Budget Office (CBO) last week released its cost estimate of the American Health Care Act, the Republicans’ plan to replace the Affordable Care Act, colloquially known as Obamacare.

The CBO looked at a range of impacts. The headline numbers from the CBO estimate are a reduction in the federal deficit between 2017 and 2026 by $337 billion and a total of 52 million uninsured by 2026 (with 14 million losing insurance next year). There’s something to like (deficit reduction) and something to dislike (loss of health insurance for millions), depending on where you stand on these issues. Without passing judgment on the significance of the potential effects of the new bill, let’s focus on the reaction of the bill’s backers, including the White House, to the CBO and its work.

Even before the CBO report was published on March 9, potshots were being taken at the normally highly respected office. Forbes characterized the them as a “pre-emptive, coordinated attack.” Joe Barton, a Republican former House Energy and Commerce Committee Chairman, had this to say about the CBO: “I don’t know what they do, they sit around and think great thoughts and everything on the issues…One of the things we need to do is reform the CBO folks.” And Gary Cohn, director of the White House National Economic Council said on Fox News that “in the past, the CBO score has really been meaningless.”

The reactions suggest that some supporters of “repeal and replace” already sensed that the new healthcare proposal would not follow Trump’s professed goal of providing all Americans with great healthcare at lower costs than Obamacare. It is also worth remembering that the CBO director, Keith Hall, was named to his post by Republicans. This doesn’t mean that the CBO always gets its numbers right. It doesn’t. But its analysis is transparent and explained in enough detail that one can understand how it reaches its conclusions.

As an evaluator, part of whose work involves estimating the impacts of policy reforms, I can sympathize with the CBO being targeted for attack. Conducting evaluations, which is essentially what the CBO has done in tallying the costs and benefits of replacing Obamacare, is a great way to lose friends and alienate people. Evaluators are never the most popular kids on the block. We don’t control pots of money, we aren’t trumpeting success stories, our job doesn’t involve being ingratiating in order to sell stuff. We dig around and find out what worked and what didn’t, who’s winning and who’s losing. It’s necessary (and hopefully useful) work, but it’s not a popularity contest. And evaluations always turn up shortcomings. Nobody’s perfect. As the messenger, you can expect to get (metaphorically) shot at.

At a minimum, people get a bit nervous when their organization or program is evaluated. Even if the client who commissions the evaluation outlines the questions they want answered, evaluators are still being allowed ‘inside’; they’re able to ask questions of pretty much anyone connected to or benefiting from the project. Good evaluators pry through reports, extract data from whatever sources they can get their hands on, and double check everything they hear. Sometimes, the evaluation can seem a lot like an investigation.

I’ve conducted evaluations all over the world, some of them under fairly hostile circumstances. Even if the main client wants to have evidence on the impacts of a reform, that doesn’t mean everyone wants to know. There are potential winners and losers who have a stake in the outcome of your evaluation. There are vested interests. Trade union representatives, for example, can be a tough bunch. I once worked on an evaluation of the potential impacts of a mine privatization in eastern Serbia. Layoffs were expected. When I conduct this type of work, it is my policy to meet with representatives of all the affected groups. In this case, everyone knew that the restructuring was going to lead to the loss of about 2,500 jobs. It was the task of my evaluation team to estimate what would happen to their income and job prospects afterwards. The concerns of the workers were legitimate and completely understandable from their perspective, even if the mine was dependent on tens of millions of dollars of budget support annually. My approach to dealing with the trade unions was to open a line of communication with them, and keep it open throughout the study preparation, fieldwork and reporting period. This involved meeting with them periodically, listening to their concerns, and explaining what we researchers were doing.

On a similar study, this one collecting evidence on the impact of downsizing Croatia’s shipbuilding industry, we had a very different experience. There was unfortunately not enough budget or time to meet with the trade union representatives more than once. The antagonism toward the evaluation was considerable. Fieldwork included conducting an employee survey in a room on the premises of the shipyards. In fact, our survey supervisor, a young Croatian woman, was asked by a shipyard manager to turn over the list of (randomly selected) employees she was interviewing. When she refused he locked the door to the room and threatened not to let her out unless she complied with his request. She resolutely stuck to her guns however, risking her safety and wellbeing in the name of evaluation ethics. Luckily, she was able to call someone in the Ministry from the locked room with her cell phone, and secure her own release. But it left her shaken. I have even heard of survey interviewers in some countries being detained and jailed for doing their work.

In some respects, evaluators are indeed like investigative reporters. That makes the work interesting, and occasionally risky. But the evaluator as an investigator is not really the ideal association you want to create. It can sound, well, threatening. Another, and more conducive analogy is that of evaluator as a “critical friend.” This concept was proposed a quarter century ago by Costa and Kallick in a 1993 article. They noted that critical friendship must begin with building trust, and go on to highlight the qualities that such a friend provides. That includes listening well, offering value judgments only when the learner (i.e. the client) asks for them, responding with integrity to the work, and advocating for the success of the work (p. 50). As an evaluator, you are not trying to establish guilt, or attack or push an agenda. You are there to help the organization or policy maker better understand the impacts of their programs or proposals, and improve them so that their goals can be attained.

Going back to the CBO’s report, it reads like a levelheaded, thoughtful piece of analysis. If its critics have a problem with it, you might expect them (at least in a less frenzied atmosphere) to respond by questioning its assumptions, or offering counter-evidence. When critical voices fail to do this, it is probably because they don’t have good answers.

This does not mean that, as evaluators, we can be smug. We live in a world where the idea of “evidence-based” does not have a strong hold on the public’s imagination, and is anathema to many politicians. We need to work harder, and use the evidence we have to tell a more compelling tale.

Nils Junge March 21, 2017

Are we being surveyed to death?

In just the past few weeks, I’ve received online requests to fill out surveys from my bank, my newspaper, an airline I flew with, a hotel I stayed at, Best Buy, the DC regulatory agency, several graduate students, and my dentist’s office. I’ve been going to the same dentist for almost 15 years – an indicator which suggests I’m quite satisfied with her care. So why am I asked to fill out a survey after every single visit?

Because designing and implementing surveys is one of the things I do for a living, I tend to sympathize with whoever prepares these online surveys and is hoping for a high response rate. However…

Providers of goods and services want customer feedback for marketing purposes and, presumably, for improving their performance. I understand that. Yet, because it is so easy now to conduct surveys via the internet, we are getting bombarded with them. Surveys are just too easy to create and disseminate. All these companies and researchers are asking us to give them our precious time without, however, offering anything in return. Well, for a while, a hotel chain I frequently used would at least offer 250 award points for every customer survey I completed after a stay, but they eventually stopped with that incentive and I, quite rationally, stopped responding. The deal was off. In fact, I’ve mostly stopped responding to any surveys at all, at least of the online variety.

Indeed, people in general are responding less and less willingly to surveys; over the last several decades, there has been a steady decline in survey in response rates. This is apparent in annual or quarterly surveys which seek to elicit information on incomes, expenditures, and assets – the type used by the government and researchers to gauge changes in national wellbeing, or inequality. To take one typical case, whereas in 1990 just 12 percent of survey people did not respond to the US Census Bureau’s Consumer Expenditure Diary, in 2009 the share had risen to 29.7 percent, as reported by Roger Tourangeau and Thomas J. Plewes. In their 2013 book, Nonresponse in Social Science Surveys: A Research Agenda they note that survey nonresponse is a growing issue not just in the US, but in all wealthy countries. Of course, this is also the period over which we’ve seen the rapid rise of internet access. Is it a coincidence? Maybe not.

One can easily imagine how survey, or interview, fatigue has become an issue. Survey fatigue generally refers to the phenomenon of a respondent tiring before she has answered all the questions. (Remedies include reducing the number of questions, and making them more interesting and relevant. Tourangeau and Thomas J. Plewes also cite studies showing that female interviewers get higher response rates to surveys, but we’ll save an analysis of the gender angle for another time. ) However, another type of survey fatigue, which I can personally attest to, comes from being surveyed just too darn often.

Because they are so cheap and ubiquitous, survey proliferation could be creating problems for researchers attempting to get a better sense of the attitudes, or merely trying to track economic and social trends over time. If non-responses are more common among very high income households inequality may be underestimated, a point that Nobel-prize winning economist Angus Deaton has highlighted. It’s hard to say for sure. We would need to do a survey, to ask folks if they’re tired of being surveyed! While the meta-nature of such an endeavor appeals to my sense of irony, it might be tempting fate.

In the meantime, if you’re as tired as I am of clicking on those ‘yes’, ‘no’ and ‘don’t know’ boxes, try to be more selective. Think about who’s asking you for information, and who’s going to benefit. Is it a firm, which has already squeezed your wallet a bit and now wants to squeeze your brain as well? Or is it some other sort of research which, potentially, is somehow targeting the greater good?

Edited July 10, 2019

Nils Junge March 7, 2017

Timing: Evaluation’s Dirty Little Secret

Apologies in advance for the slightly sensational headline. I’m trying not to let these blog posts get mired too deeply in the earnest and the technical.

What I’m about to tell you isn’t exactly news, and it is far from salacious. Yet, it is hugely important for thinking about pretty much any change you ever hear or read about. These can be changes in a patient’s health after starting a new medication; the sprouting and blossoming of an iris in your garden in the spring; changes in a baseball player’s batting average after he adjusts his swing; and changes in someone’s earnings after they get a college degree. It applies to virtually anything that involves cause and effect. Open a newspaper, listen to the radio, pay attention to what people talk about at work or over dinner. Most of the time, it comes down to a story of ”this happened” or ”so-and-so did this,” and as a result, x, y, and z occurred.

So what is the dirty secret with timing? It is this: your result will be heavily influenced by the point in time your study is conducted. Bear with me for a minute while I fetch my watercolors and brush, and paint you a fairly typical scenario from the development aid world.

Let’s imagine that, several years ago, a big project to help improve people’s lives was implemented. A new agricultural technique to grow vegetables was introduced in the country of Ruritania. Now the government who funded this project wants to know – how much bigger was the yield as a result? Was the money well-spent? Are people better off?

Now imagine that, two years later, a world class team of researchers has been assembled. They employ the most rigorous research design and methodological skills ever developed. Money is not an issue (This is my super, best-case scenario). They design and implement an evaluation that asks the right questions, surveys the right people, and is statistically rigorous in all aspects. It includes qualitative research to understand why and how those changes are or are not occurring. The supervision of the field research and quality control were outstanding. The results have been checked and rechecked for accuracy. It was, in short, a legendary evaluation! The outcome – drum roll, please – was that average yields had increased by 43.5% after two years.

Is that not a terrific result? Yes, most certainly! Our donor is thrilled. But the truth, folks, is that 43.5% is a totally arbitrary figure. Why? Because of the timing. That’s our secret. That lovely figure is culled from just one point in time. Come back in two years, apply the same methods, and you will get another answer. I guarantee it. Maybe come back in 20 years, and you’ll find that average yields had settled in at between 20 and 30% higher than they were before. Or maybe they’re back to square one, perhaps because of soil depletion (We won’t even get into the 500 external factors which influence yields).

You see, every development follows its own trajectory, and the point at which you take the measurement matters hugely. I owe this insight to Michael Woolcock of the World Bank, who emphasized its importance in a seminar I attended a few years ago on qualitative methods and the limits of quantifying change.

Change trajectories follow their own patterns. Some start slowly and then accelerate. Think of the college graduate and her earnings. After graduation, she works for months in a fast-food restaurant, quits out of frustration and is jobless, but then a year later, bingo! she lands a high-paying job in Silicon Valley. (If you had measured the impact of her college degree after 6 months, it would have been disappointing, but if you measured the impact after one year, it would have been very positive!) On the other hand, some developments start rapidly, but eventually fade away. Think of the flower that blooms so prettily in spring, but eventually wilts, as she must. Some changes exhibit modest, steady improvement, and then level off – think of the patient who, after taking the pills, feels a bit better each day, and after a week is back to normal from a full recovery.

This subject is one over several touched on in a fascinating podcast on Freakonomics Radio that describes the results of research conducted by Raj Chetty and colleagues on a 1990’s program, “Moving to Opportunity,” in which poor families in poor neighborhoods were given the chance to move elsewhere. The initial results, in terms of changes in earnings after a 10-year time lapse, were disappointing. But when they repeated the study after 10 more years, they found that the very young children in those families, who were now of working age, were seeing significant differences. It was a matter of timing (And realizing that the main beneficiaries, at least in monetary terms, were the toddlers).

Photo credit: Phaedra Wilkinson

What I’ve described above is not an argument against doing evaluations, or an argument against trying to measure impacts as accurately as you can. It is a call to take into account the nature of our world. It is extremely rare for anything to change in a constant, upward direction.

So, what does this mean for evaluations? For users of evaluation research, it means, don’t take those scientific precise results you get as immutable. They tell about the magnitude of a change at a specific point in time. If sustainability is an important issue (and usually it is) you should do a follow-up evaluation at regular intervals. If that sounds expensive, maybe don’t spend all your funds on one evaluation that will give you precise but chronologically arbitrary results.

For us evaluators, it means that we need to talk about the results we obtain in a way that values accuracy and rigor, yet doesn’t fetishize precision. We need to take into account the limitations while also probing more deeply into how those changes are occurring. When conducting research, we must consider the trajectory of a change. We should take a deeper interest in the chronological context by asking questions about how the changes have come about: how quickly or slowly? What do people experiencing or observing the changes expect in the future? This may not be a scientific method, but I would argue it embodies the scientific spirit by asking important questions of how a change transpired, and what path it took. It means recognizing the ephemeral. It means accepting that, in most cases, we grasp at answers at a moment in time, and tomorrow those answers might be completely different.

Nils Junge February 28, 2017

Making the case for credibility

In an op-ed for the Washington Post on February 3, discussing journalism, Ted Koppel wrote that “we are already knee deep in an environment that permits, indeed encourages, the viral distribution of pure nonsense.” What is disconcerting is that many people may not care, as long as the nonsense aligns with their worldview.

Take note, evaluators, and anyone for whom collecting evidence is important. If until now the critical issue was ensuring your evidence was credible, henceforth the challenge may be convincing others that credibility even matters. We have entered an era in which information has gone from being something more or less firm, to one in where it is going to be fluid.

The term ‘post-truth’ was selected by Oxford Dictionaries as the word of the year for 2016, defined as “Relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.” While Oxford Dictionaries has highlighted a serious problem with its selection, I think it would be more accurate to call post-truth the euphemism of the year. Post-truth just smells Orwellian; its academic-sounding preface adds a gloss of respectability to an insidious practice. Post-truth is not related to truth in the way post-modernism is related to modernism. The term hardly deserves to be dignified. A better description is ‘anti-truth.’ This would more accurately and honestly convey what happens when half-truths and falsehoods are spread, aimed at degrading the consensus on reality and contaminating public discourse.

Yes, there are grey areas when it comes to information. It can have multiple meanings. You can argue opposing sides by marshaling selective facts to make your case. (Lawyers are trained to do this.) Thus, it is accurate to note that under the Obama administration (January 2009 to January 2017) unemployment fell from 7.8 to 4.8 percent, which is a good thing. But you can also point to a fall in labor force participation rates from 65.7 to 62.9 percent. Not such a good thing. But in order to have a meaningful argument rather than, say, a shouting match, the basic facts must be accepted, and accessible, to all. If one side says, we don’t trust the US Bureau of Labor Statistics (where these data come from), they’re just a bunch of liars, then there is no basis for conversation.

What seems to be occurring is that one side has become increasingly less interested in engaging in a meaningful argument and is happy to make stuff up, i.e. invent facts. And when credible evidence is produced, it is now often derided as false. For example, the controversy over Obama’s birth certificate: although the certificate was made available in 2011, as of 2016, 41 percent of Republicans disagreed with the statement in a NBC News|SurveyMonkey poll that “Barack Obama was born in the United States.”

Will the skepticism of credible sources filter down to the technical research work conducted in the social sciences? Let us hope not, although the new Administration’s gag order on scientists in federal agencies is not encouraging. We may have to confront a whole new dilemma. No longer will it be sufficient to provide credible evidence, transparency of methodology, and detailed information on sources. We may need to defend the very concept of credibility, make a case why credibility matters to those who disagree.

Nils Junge February 5, 2017

Are you willing to pay for that?

Prices go up. That’s part of life, whether we like it or not. We can just go along with it, reduce our consumption, or…take a stand. For governments providing a service for which they charge, it’s a balancing act. How to raise the price of something without causing hardship or protests, while still covering the cost of providing it?

Last year I was involved in designing a study that, among other things, assessed customer’s willingness to pay for better utility service. This meant asking people all over the country, as part of a household survey, whether they would pay more for better service.

Governments planning to develop or upgrade public services may be interested in knowing to what extent consumers are willing to bear the costs of investment, via higher rates. For example, if water supply service provision is substandard, governments will develop investment plans to improve quality, supply, access, etc. Governments can go to commercial or development banks to access financing up front. However, typically they will seek to recoup at least part of those costs by passing them on to customers via higher tariffs. That was what brought me to the country in question.

The survey would, ideally, help determine what poor households could afford and whether proposed new tariff levels would pose a hardship or not. An array of mitigation measures, including various kinds of subsidies, can be developed for those households deemed to need them. Beyond that, governments also want to know about overall household tolerance for paying more. Will higher tariffs lead to higher non-payment levels? Will they bring people to the streets? Could proposed tariff increases fail to pass? In that case the whole investment strategy would be called into question.

The way a willingness to pay (WTP) study works is that respondent are asked if they would pay more, either a specific amount, or as a percentage of their current water bill. Originally, this method contingent valuation, was used to estimate whether and how much extra people would pay for an environmental good, such as clean air or water.

There are all kinds of different ways of asking WTP questions. For example, you can ask a yes/no question, ask different households about different amounts and build a demand curve, or use an open-ended approach, asking them to volunteer an amount themselves. Naturally, how you formulate the question will affect the answer. And getting reliable answers is a challenge. Some people hold their cards close to their chest, unwilling to reveal they might pay more. Consciously or unconsciously, they enter bargaining scenario, like at a bazaar, hoping to get the best deal. Others may answer that they’re willing to pay a higher amount than they would like. They may be trying to convince the government to just hurry up and get on with the investment, and let’s worry about the tariffs later (when maybe they won’t go up quite so much).

Almost all WTP methods are quantitative. The idea is to collect data from a sample of households that represent a population of interest (a city, a region, a country). Now, getting back to our survey: about half replied they would not be willing to pay a cent more for water and sanitation improvements. Of the half that did say they’d pay more, the amount was about 5%. Fair enough.

What was interesting, however, was that when we broached the subject of paying more during a focus group, covering questions very similar to our survey, but in an open-ended, more in-depth manner, the share of people saying they’d pay more for better service was very high. Almost everyone said yes. Of course, this is not a scientific comparison – a thousand households vs ten people sitting around a table. But it got me thinking again about the manner in which we as researchers engage with our subjects. How much does the context matter? Focus group discussions, by their nature, foster open dialogue and exchange of ideas. They tend to put the research subjects on a different footing. They are given more agency, they able to share their thoughts and ideas with a moderator who guides the conversation. (Unlike the survey interviewer, the focus group moderator is not trying to get through a long list of questions, looking to get a limited range of responses.)

This got me thinking about a hypothesis that might be worth testing. It goes something like this: asking about people’s willingness to pay as part of a back-and-forth conversation (i.e. focus group) rather than via a survey questionnaire leads to a greater stated willingness to pay. In the context of a conversation people can ask clarifying questions, can explain their reasoning, can describe under what conditions they would be willing to cough up more. Engaging them more as equals, as ‘experts’ so to speak, in the matter of their own consumption habits, leads to a different place. Trust is probably going to be higher when people don’t feel so much like they’re just going to be a data point.

Methodological purists may argue that, ‘well, of course you’re getting different responses, since you’re asking questions in a different way.” My response is – that’s exactly the point. To me, the more relevant question is, to what extent do the research methods mimic real world conditions under which higher tariffs are pushed through?

While it would be certainly more difficult to analyze qualitative research on WTP than quantitative survey data, let us test what the former approach can add. Engaging customers through such an approach, treating them as thoughtful partners with valuable perspective on a public service, rather than only as data points, could be useful in informing tariff strategy. It could yield more information concerning the conditions people would countenance paying more, why and why not, and how to best to engage them on this often difficult topic.

The next time you ask someone to pay more for a service, don’t just take ‘no’…or ‘yes’ for answer.

Nils Junge January 10, 2017

1 2 3