April | 2017 | Evaluate This

Check your outlier – is it a symptom or an anomaly?

The shocking United Airlines ejection of a passenger was an outlier

This week I’d like to talk about outliers. These are people, events, or data points that are so far from the norm that they attract unusual amounts of attention. Outliers make the news. Most of the news stories you read concern exceptions or unexpected events. They grab out attention. The disturbing United Airlines’ scene of a paying passenger being forcibly and violently ejected from the plane represents just such an outlier.

Last week, a passenger was brutally dragged off a United Airlines flight in Chicago by security guards who broke his teeth and nose, leaving him with blood streaming down his face and a concussion. He had refused to give up a seat he had paid for, and United decided that he and three other passengers (chosen randomly according to some algorithm) had to leave the plane. The airline had offered travel vouchers (of $800) but no one had taken them, apparently. So it decided – time to draw straws. And what for? So space could be made for four airline employees, arriving at the last minute for the flight bound for Louisville, Kentucky.

The scene of security personnel pulling the 69-year old Asian American (he was born in Vietnam) through the aisle to the horrified looks and gasps of other passengers was filmed on smartphones and duly posted to the web. It has generated a huge outcry and calls for a boycott of United. The company made things worse when CEO Oscar Munoz issued a terse non-apology, blaming the passenger, Dr. Dao, for being “disruptive” and “belligerent”. Munoz wrote: “I apologize for having to re-accommodate these customers,” which is about as far away from saying sorry as it gets before entering antonym territory.

Meanwhile, there are plenty of news articles predicting that the public will get over it, and United Airlines will weather the storm. This is because many companies have overcome this type of scandal in the past, and because many customers don’t have a lot of choice when it comes to airlines. Consolidation among the big legacy airlines, blessed by the regulators, has ensured that.

The shocking incident is highly unusual, which is why it generated so much attention. We are used to hearing about passengers being escorted from flights for misbehaving, but in this case it was the airline which misbehaved (even if it did outsource the job to airport security).

Outliers – good for the news, but not so good for research?

In the research world, outliers are not a news opportunity. Most of the time, they’re viewed as a problem, sticking out like a sore thumb. Outliers raise questions about data reliability and validity. They distort the mean, leaving an inaccurate impression, even if the data is 100% correct. (Remember the one about how Bill Gates walks into a bar and suddenly everyone is a billionaire, on average?) The solution? Outliers are typically ignored or dropped from datasets by researchers.

But outliers can mean different things. They can be symptomatic or anomalous. Maybe an outlier highlights a larger problem, represents the tip of the iceberg, a leading indicator, a canary in the coal mine, i.e. a symptom of some larger phenomenon or a trend that is about to break. Or maybe it represents an exception to the rule, a bad apple, and can justifiably be disregarded. It is also possible that the outlier is an error. Before deciding how to deal with them or react to them, we need to understand what they mean. If they’re signaling something, then even researchers need to take a closer look at them. Like a doctor, you should check your outlier in order to make a diagnosis.

In a 2002 article, Vijayendra Rao advocates “having tea with an outlier,” i.e. looking more closely at what they represent, and maybe even talking to them if they represent a survey respondent, to get a different perspective on the issues.

The outlier may have different intrinsic characteristics that sets it apart. I was once asked to find a poor person in a village in Kazakhstan in a region where I was conducting an evaluation for the Asian Development Bank. It wasn’t easy, because the people we talked to didn’t really consider themselves poor and had to scratch their heads when we asked them to point us toward a poor household. Finally, my team members and I were directed to the home of a single mother. We brought her some groceries, and knocked on her door. She invited us in and we sat down and talked to her. It turned out that she had emigrated from Ukraine (I can’t remember whether her husband had died or merely left her) and thus lacked a social support network. She had problems with her papers. There were also health issues. I don’t believe we actually had tea. She was an anomaly, an exception. She didn’t represent the typical inhabitants of the region. While we learned about what kind of factors might drive people into poverty, her case didn’t tell us much about poverty issues among the population as a whole.

The case for symptomatic

But if the outlier represents an extreme case of a phenomenon that is happening to a lesser degree elsewhere, then it takes on a different meaning. What do we have with United? I would argue that the case is symptomatic and not anomalous. Indeed, although it was well outside the norm (the chances of a passenger getting bumped from a flight remain 1 in 10,000, and the chance of losing your teeth in the process remain vanishingly small) the mood against United has been building for some time and helps explain the outrage. United certainly did not have a good customer service reputation prior, and the extreme mistreatment represents, in many people’s eyes, all that is wrong with the company. The frustration and anger over poor service boiled over. United’s reputation was already solidly second rate. It ranks 66 out of 100 global airlines according to one survey.

My own experience flying United is far from pleasant, and presumably widely shared. Anyone who has flown with a European, Gulf region or Asian airline will know that US carriers in general deliver poor service. While with the better international carriers you might feel as though you were their guest, on most American carriers you feel like a revenue source that, inconveniently, must be processed, takes up physical space, and requires (minimal) attention. They get away with it because of a lack of competition, and because they know passengers put up with it because of the relatively low prices.

The irony is that Americans on the whole don’t tend to be unfriendly; quite the opposite. But once hired by an airline, I can only assume that they are processed through some kind of training module which strips away as much of their humanity as possible (although you will occasionally interact with a friendly crew member or ground staff whom the system clearly failed to process).

Finally, CEO Munoz’s initial reaction to back his staff and essentially blame the passenger was very telling and fairly indicative of United Airlines’ attitudes in general. Based on his initial reaction for Munoz the incident was not such a big deal. In other words, it was within the bounds of normalcy. That suggests it was symptomatic, not anomalous. Granted, Munoz did issue a proper apology days later, but that was tainted by the strong suspicion that it was a reaction to the airline’s share price dropping a bit and not some sort of recognition that it’s approach to customers in general is woefully lacking in common decency.

So while the news articles which argue that United will survive this debacle may be correct, it doesn’t mean that this particular extreme behavior doesn’t mean anything. I believe the evidence supports the view that it reveals a lot. It is symptomatic of a much larger problem – in a word, disrespect toward passengers, those important but still annoying revenue streams.

(If you’re curious, I will do my best to avoid giving United my business in the future, even if it costs extra.)

Inadvertent airline humor

I leave you with some one liners. They are taken verbatim from a link on United Airlines’ own website called, without a trace of irony, Shared Purpose and Values:

We Fly Right: On the ground and in the air, we hold ourselves to the highest standards in safety and reliability. We earn trust by doing things the right way and delivering on our commitments every day.
We Fly Friendly: Warm and welcoming is who we are.
We Fly Together: As a united United, we respect every voice, communicate openly and honestly, make decisions with facts and empathy, and celebrate our journey together.
We Fly Above & Beyond: With an ambition to win, a commitment to excellence, and a passion for staying a step ahead, we are unmatched in our drive to be the best.

Even setting aside the passenger ejection incident, anyone who has ever flown on United – an airline with some of the most unhelpful and unfriendly employees in the world – will be forced to acknowledge that they do have a dry sense of humor.

Nils Junge April 18, 2017

Give me a number, any number

When interviewing people as part of an evaluation, at some point I like to put them on the spot.

The interviewees will be well-informed about the program under evaluation. That’s how they were selected. They might be policy makers, managers, program implementers, sector specialists or some other type of what we in the business call “key informants,” or “KIs.” The interviews are semi-structured, with a pre-determined set of questions or topics. That means the answers can be open-ended, in contrast to surveys where most responses need to be kept succinct. The open-ended format allows the interviewer to probe, follow-up or clarify particular points. It’s not so different from how a journalist interviews a subject, or police detective interviews a suspect. It’s a process of discovery as well as a matter of answering straightforward questions about program.

During the interview, as we go through the questions and the respondent shares her assessments and opinions (often in many shades of grey) I’ll press her to take a stand and defend her position. I’ll ask for a number. I want a number that summarizes, say, her subjective assessment of the program.

Sure, I like words. I’m a writer, after all. You can learn a lot from words, but there are just so darn many out there! In its ability to synthesize a story, a number can almost be poetic…

In the middle of a discussion about Program A, I’ll ask something like: “Now, based on everything you know about this energy development project, how would you rate its impact on a scale of 1 to 5, where 1 means you noticed no impact at all and 5 means you noticed a very strong impact?” The key informant will come up with a number, say a “4.” I make a note of it, and then follow up with something like “Please elaborate. Tell me why you rated it a 4?” And the interviewee builds a case, offering more details, providing a rationale.

Sometimes the response you get is a surprise. A key informant will be criticizing a program left and right, but then rate it 4 out of 5. Why the apparent disconnect? Apparently all those criticisms carried less weight for the respondent, in the grand scheme of things, than I had assumed. You see, if I hadn’t asked them to give me a number, I might well have walked away from the interview thinking, “Hmm, he thought that program sucked.” In fact, they thought the program was pretty good, but just had some caveats. You can find the opposite scenario as well, of course. A person speaks positively about a program, then rates it a 3. It could be they just had very high expectations.

These are scale questions, referred to as Likert-type scale survey questions after the psychologist who invented the concept, Rensis Likert. Of course, such rating questions are extremely common in surveys. Who hasn’t responded to an online or in-person interview which didn’t include a scale question? Online sites including Amazon, Netflix, TripAdvisor, and Yelp use a similar approach to get us to rate products or services.

The concept has something in common with the efficient market hypothesis, which states that share prices reflect all current available information. All the negatives, positives, and expectations are priced into that one number. Doctors use a pain scale to gauge a patient’s chronic pain. Therapists will ask their clients to rate their depression using a scale. Similarly, evaluators might use a scale to understand the degree of effectiveness of an intervention, or a number of issues.

Typically, the scale question is used for closed-ended interviews, as part of surveys. Responses can be analyzed and used to obtain an aggregate measure for all the 1,243 respondents, as well as different subgroups. For example, you might be find that, on average female participants rate the program’s ability to improve their lives at 4.3, while male participants rated it 3.8. Done well, this line of inquiry can be a valuable method for taking a population’s pulse on an issue.

The nice feature of open-ended interviews with key informants is that you are able to do a little digging after they’ve coughed up a number. In a survey, if you ask respondents to explain their answers, things can get complicated – those longish answers can’t be easily summarized numerically. (Of course, open-ended answers can be coded according to type, but then you lose a lot of that rich detail along the way.) You don’t face that constraint when conducting qualitative research. You face other constraints instead.

I find that rating questions are a good way of cutting through the dense jungle of information you can be pulled into when doing research. You’re walking along, making your observations of the flora and fauna, taking as much in as you can, using your machete to clear the way, trying to figure out whether you’re heading in the right direction (i.e. testing incoming information against your hypothesis, the path which may or may not lead you to the truth). And then you emerge onto a rocky outcropping…and all at once see the whole rainforest spread out below you. Aha! So that’s how the program looks.

Rather than being a reductive or sterile exercise, I’ve found that people being interviewed rather like this type of questioning. They appear to enjoy the exercise of mentally processing large amounts of information on a subject to generate out a single number. And they like explaining how they got there.

Essentially, this is a way to leverage people’s cognitive functions. You’re engaging them in a kind of meta-cognition exercise, in which they examine and explain their own thought processes.

Try it out on yourself. On a five-point scale, rate your satisfaction with, say the place you live; your own job performance…or how much of your life you spend online. Then justify that number in words. You will most likely find that your brain immediately begins sorting through a whole succession of factors, lining up the pros and cons, weighing them against each other.

It may be that I’ve spent far too much of my life evaluating stuff, but I honestly find this exercise quite revealing, stimulating even, in a cerebral sort of way.

Nils Junge April 11, 2017

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30