Source: Quartz, Dec 2017
the Implicit Association Test (IAT), from Yale’s freshmen to millions of people worldwide. Referencing the role of implicit bias in perpetuating the gender pay gap or racist police shootings is widely considered woke, while IAT-focused diversity training is now a litmus test for whether an organization is progressive.
This acclaimed and hugely influential test, though, has repeatedly fallen short of basic scientific standards.
The forgiving notion of unconscious prejudice has become the go-to explanation for all manner of discrimination, but the shaky science behind the IAT suggests this theory isn’t simply easy, but false. And if implicit bias is a weak scapegoat, we must confront the troubling reality that society is still, disturbingly, all too consciously racist and sexist.
The latest scientific research suggests there’s a very good reason why these well-meaning workshops have been so utterly ineffectual. A 2017 meta-analysis that looked at 494 previous studies (currently under peer review and not yet published in a journal) from several researchers, including Nosek, found that reducing implicit bias did not affect behavior. “Our findings suggest that changes in measured implicit bias are possible, but those changes do not necessarily translate into changes in explicit bias or behavior,” wrote the psychologists.
“I was pretty shocked that the meta-analysis found so little evidence of a change in behavior that corresponded with a change in implicit bias,” Patrick Forscher, psychology professor at the University of Arkansas and one of the co-authors of the meta-analysis, wrote in an email.
In recent years, a series of studies have led to significant concerns about the IAT’s reliability and validity. These findings, raising basic scientific questions about what the test actually does, can explain why trainings based on the IAT have failed to change discriminatory behavior.
First, reliability: In psychology, a test has strong “test-retest reliability” when a user can retake it and get a roughly similar score. Perfect reliability is scored as a 1, and defined as when a group of people repeatedly take the same test and their scores are always ranked in the exact same order. It’s a tough ask. A psychological test is considered strong if it has a test-retest reliability of at least 0.7, and preferably over 0.8.
Current studies have found the race IAT to have a test-retest reliability score of 0.44, while the IAT overall is around 0.5 (pdf); even the high end of that range is considered “unacceptable” in psychology. It means users get wildly different scores whenever they retake the test.
The second major concern is the IAT’s “validity,” a measure of how effective a test is at gauging what it aims to test. Validity is firmly established by showing that test results can predict related behaviors, and the creators of the IAT have long insisted their test can predict discriminatory behavior. This point is absolutely crucial: after all, if a test claiming to expose unconscious prejudice does not correlate with evidence of prejudice, there’s little reason to take it seriously.
four separate (pdf) meta–analyses(pdf), undertaken between 2009 and 2015—each examining between 46 and 167 individual studies—all showed the IAT to be a weak predictor of behavior. Two of the meta-analyses focus on the race IAT while two examine the IAT’s links with behavior more broadly, but all four show weak predictive abilities.
When Banaji and Greenwald first came up with the phrase “implicit bias,” they claimed it reflected thinking that is “unavailable to self-report or introspection.” The research showing that people are aware of their implicit biases suggests this definition is suspect.
The meta-analyses showed that the IAT is no better at predicting discriminatory behavior (including microaggressions) than explicit measures of explicit bias, such as the Modern Racism Scale, which evaluates racism simply by asking participants to state their level of agreement with statements like, “Blacks are getting too demanding in their push for equal rights.”
In a 2014 paper (pdf) by Banaji, Greenwald, and Nosek, the authors seemed to acknowledge the concerns raised about the test: “IAT measures have two properties that render them problematic to use to classify persons as likely to engage in discrimination,” they wrote, pointing to the test’s poor predictive abilities and test-retest reliability.
But when I asked them directly, Greenwald and Banaji doubled down on their earlier claims. “The IAT can be used to select people who would be less likely than others to engage in discriminatory behavior,” wrote Greenwald in an email.
The meta-analyses and other psychologists I spoke to strongly disagree: “There is also little evidence that the IAT can meaningfully predict discrimination,” notes one paper, “and we thus strongly caution against any practical applications of the IAT that rest on this assumption.”