Fighting Blind: Why Bias Evades Measurement–and Why That Matters

Becker and Warshauer’s ISH project addresses the concern expressed in this piece that survey participants in studies on hateful attitudes may not be truthful when they know such attitudes are socially undesirable, and pilots an original technique to elicit truthful responses.

To combat hate we must first understand it. In order to understand it, we must first recognize it. And to recognize it, we must be able to define and measure it. In other words, we have to be able to identify how prevalent these attitudes are, and whether our interventions are changing them. Unfortunately, this first, most fundamental stage has also proven the most difficult. 

A substantial body of social science research shows that we usually cannot directly ask people about hateful beliefs and behaviors and expect completely accurate responses because of biases: many of the people we are asking will misreport or conceal them. They will tell us what they think we want to hear rather than what they actually believe. Whether due to self-censorship, social desirability, or fear of reprisal, direct questioning produces what researchers call “biased estimates”—results that systematically underestimate the prevalence of these attitudes. 

Researchers have developed several techniques that attempt to handle this problem. Instead of asking direct questions about people’s attitudes towards Black folks for instance, we conduct Implicit Association Tests (IATs), or probe their agreement with statements like “It’s really a matter of some people just not trying hard enough: if blacks would only try harder they could be just as well off as whites.” Yet these techniques may still prime respondents to the racial content of a study, thereby altering the behavior the researcher seeks to measure. As a result, we also employ various experiments that give respondents cover to answer honestly. Yet these techniques limit the kinds of questions we can ask, can prove confusing for respondents, and almost always lack a “ground truth” for us to compare the results to. They may reveal more bias than direct questioning, but we still do not know whether they reveal the true level of bias in the population. 

These limitations have led much real-world practice to rely on approaches that we know, with a high degree of confidence, do not work. This is not merely an academic inconvenience. If we cannot measure the prevalence of hateful attitudes with confidence, we cannot know whether they are growing or receding, whether our interventions are working or backfiring, whether the problem is concentrated or diffuse. We are fighting a battle to reduce the prevalence and impact of these attitudes—one we may well be losing—not in the light of day, but under a strobe light that only intermittently illuminates our target while obscuring its movements. 

Consider one example. In the decades since the Civil Rights Movement, American employers have spent billions of dollars on diversity and implicit bias training programs. These programs are designed, in part, around IAT scores—whose interpretation remains contested— and on forms of direct questioning that respondents can easily game.

The results of these programs have been, to put it charitably, underwhelming. Sociologists Frank Dobbin and Alexandra Kalev, analyzing thirty years of data from more than 800 U.S. firms, found that five years after mandatory diversity training was introduced, the share of Black women in management had fallen by 9%. Other programs designed to limit bias, like job tests or grievance systems, had similarly discouraging results. The problem, they argue, is broader than the training merely failing to work. By contrast, they argue that forcing people through exercises designed to expose and remediate their biases can actually activate those biases rather than defuse them. In other words, we built an intervention on a shaky measurement and evaluation, and the intervention may have made things worse.

Tiffany Green, an economist, and Nao Hagiwara, a social psychologist, writing in Scientific American, arrived at a similar conclusion regarding the healthcare industry: despite billions spent on implicit bias training for clinicians (it should be noted that these types of trainings are mandated in states like Michigan and California) there is still no rigorous evidence that it produces lasting changes in how doctors treat patients of different races despite people being able to give the “correct” answers to questions about bias on their end-of-training questionnaires. Further, when analyzed longitudinally, the positive effects of diversity training rarely survive beyond a day or two.

This is the cost of fighting under a strobe light. We did more than just fail to measure the problem accurately. We assumed our measurements were the ground truth and built an entire industry of solutions on top of that failure. As we have seen in recent years, that industry may be consuming resources, patience, and political capital that could have gone toward creating more effective approaches. When the light steadies and we get a clearer view, what we find is not that we were standing still, but that we were moving in the wrong direction.

We do not have a clean solution—and we should be skeptical of anyone who claims otherwise. What we have instead is a set of imperfect tools, an obligation to use them carefully, and a responsibility not to let the difficulty of the task become an excuse for inaction. Measuring hatred is hard. It has always been hard. It is arguably getting harder. The answer is not to stop measuring, but to measure more carefully, more honestly, and with greater clarity about what we do not yet know.

__________________________________________________

Clayton Becker is a PhD Candidate in Political Science at UCLA. His research is primarily about local politics and the public input process, with a complimentary research agenda focused on prejudice and prejudice reduction. 

Connor Warshauer is a 4th year PhD student in Political Science at UCLA. He studies interest group politics, representation and voter preferences, and prejudice and discrimination. His research emphasizes how surveys can better be used to elicit respondents’ true beliefs and attitudes.