It sounds more like another day in vegas for the psychology field of that era. This was not an exception, and often researchers were not even aware they were doing something wrong. Even nowadays psychology researchers are clueless as to what really p-hacking and bad statistical practices mean. And because they consider themselves honest researchers, while these practices are obviously dishonest, they do not consider themselves actually doing anything like that - but somebody else may! it is always somebody else. Now it is not as bad as that period, and there is more awareness about the most blunt violations of statistical rigour, but the actual understanding is still low for the median researcher so many grey-to-black zones exist still.
Unfortunately true. In other words: "often researchers were too stupid to ever have been let into university as freshmen".
It is sad to see stereotype threat being one of those findings that seems less and less credible. I once worked as a research assistant on a project related to stereotype threat, and I recall the study going through several iterations because it all needed to be just so -- we were testing stereotypes related to women and math, but the effect was expected to be strongest for women who were actually good at math, so it had to be a test that would be difficult enough to challenge them, but not so challenging that we would end up with a floor effect where no one succeeds. In hindsight, it's so easy to see the rationale of "oh, well we didn't find an effect because the test wasn't hard enough, so let's throw it out and try again" being a tool for p-hacking, file drawer effects, etc. But at the time...it seemed completely normal. Because it was.
I'm no longer in the field, but it is genuinely heartening that the field is heading toward more rigour, more attempts to correct the statistical and methodological mistakes, rather than digging in one's heels and prioritizing theory over evidence. But it's a long road, especially when trying to go back and validate past findings in the literature.
While I'm sure it is an honest statement, this sentiment is itself concerning. Science is ideally done at a remove - you cannot let yourself want any particular outcome. Desire for an outcome is the beginning of the path to academic dishonesty. The self-restraint required to accept an unwanted answer is perhaps THE most important selection criteria for minting new academics, apart from basic competency. (Acadmeia also has a special, and difficult, responsibility to resist broader cultural trends that seep into a field demanding certain outcomes.)
This basically never happens. I worked in academia for many years, and in psychology for some of that, and I have never met a disinterested scientist.
Like, you need to pick your topics, and the research designs within that etc, and people don't pick things that they don't care about.
This is why (particularly in social/medical/people sciences) blinding is incredibly important to produce better results.
> The self-restraint required to accept an unwanted answer is perhaps THE most important selection criteria for minting new academics,
I agree with this, but the trouble is that this is not what is currently selected for.
I once replicated (four times!) a finding seriously contrary to accepted wisdom and I basically couldn't get it published honestly. I was told to pretend that I had looked for this effect on purpose, and provide some theory around why it could be true. I think that was the point where I realised academia wasn't for me.
Now, the same thing happens in the private sector, but ironically enough, it's much less common.
If tomorrow, they say "growth mindset" is also a non-replicable phenomenon, will HN be full of smug people saying "I knew it all along, lol!"
https://en.wikipedia.org/wiki/Why_Most_Published_Research_Fi...
>Biostatisticians Jager and Leek criticized the model as being based on justifiable but arbitrary assumptions rather than empirical data, and did an investigation of their own which calculated that the false positive rate in biomedical studies was estimated to be around 14%, not over 50% as Ioannidis asserted.
Combine that with people upgrading uncertainty to certainty post-hoc when debunking comes out and you have these entertaining things. Overall, I’m glad you called it out. Once I wished I had a profile for people’s past guesses to see how actually good they are and now I have Manifold, Kalshi, and Polymarket.
It was entirely reasonable to be skeptical of stereotype thread when the concept was new, a priori, An "unsurprising" result is not necessarily one that someone confidently believed. If I flip a coin, I'm not "surprised" when the result is heads, nor when the result is tails.
Of course people may also adopt it as a personal philosophy, but that's separate.
Let this be a lesson: even if something seems like science and it confirms you bias - that doesn't mean it is true. You should look closer at things you embrace than things you reject lest you embrace a lie.
If “crackpots” turn out to be right when “reasonable people” and “science” were wrong—and this is far from the only instance of this happening—maybe we should reevaluate some things.
Just because a peer reviewed paper published in a prestigious journal says something doesn’t mean it’s true. Even a survey of multiple peer reviewed papers published over time isn’t necessarily determinative if there are common methodological issues or publishing biases. Yes, a lot of times actual crackpots will make stupid criticisms, but not every criticism is stupid even if it comes from outside the ivory tower.
In particular, whether or not a big rock falls faster than a small rock [1] is a fairly basic question in physics, which has one of the most certain answers. Virtually nothing in psychology, let alone human social psychology, is at that level of certainty, and any psychologist worth their salt will agree with that. Basically any finding in psychology should have a level of certainty somewhere between “yeah, that’s probably mostly true” and “hmm, interesting hypothesis that’s not entirely crazy, I wonder if it holds up”.
[1] Also, the literal question of whether a big rock or a small rock falls faster is trickier than you might assume if you only know the middle school version of the question. If we’re doing this on the moon, so as to dispose with the tricky aerodynamic questions, and we are answering the question from the frame of reference of an astronaut standing on the surface of the moon, the bigger rock actually does fall faster. Both rocks accelerate towards the center of the moon at the same rate, yes, but gravity works both ways. Both rocks pull on the moon but the bigger rock pulls harder, resulting in an extremely small torque on the lunar surface, drawing the side of the moon closer to it very slightly upwards. Which, from the surface frame of reference, is equivalent to the bigger rock falling faster.
This is the way forward -- preregistered studies. That, together with a promise from the publisher to publish the result regardless of whether the effect is found to be significant.
When you think about it, the incentives for publishing in science have been wrong all along. The future will be different: It will be full of null results, of ideas people had that didn't pan out. But we'll be able to trust those results.
Unfortunately, "Stereotype Threat" is not one of the effects they attempted to replicate.
Exploiting researcher degrees of freedom remains unfortunately extremely common. There needs to be some sort of statistical vanguard in the ivory towers enforcing real preregistration and good analysis practices. Strict epistemic discipline is necessary to do real science.
In physical science you can often directly measure a phenomenon and have a theory that makes very specific predictions
Anecdotally, I’ve seen with my own eyes, for example, girls getting really into coding only after seeing it demonstrated by enthusiastic women that they can see as role models in ways they would not see men.
I guess this is a far broader thing than stereotype threat - but I’m sure this larger thing is real. I fear that people who themselves have stereotypes in mind about who ‘should’ be into certain topics will use the demise or deemphasis of stereotype threat to justify not making attempts to attract or be friendly to kids who really could flourish in non-stereotypical fields - to their and society’s detriment.
there are so many studies showing "X manipulation affects Y outcome", but there's not even a hint of an attempt to explain the mechanisms in a meaningful way (cognitive experiments are usually better, but often still guilty of this).
We could do with a shut up and experiment movement, to be honest.
In order to build useful theories, we need lots more data, and forcing theories onto data actively holds us back from running the wide variety of experiments necessary for a real theory to emerge.
I guess my peeve is that nobody really collaborates on theories. Every PI is off working on their own set of pet theories (because that’s how you establish a career), but there’s really no centralizing force to get people to collaborate and work on a shared theory. Physics has the standard model that everyone can use as a common reference point and it’s helpful.
It's great seeing the original author admit the problem.
I’d love others to read the replication report and explain why I might be wrong?
The author rightly observes that despite the undeniable shift, women are still significantly underrepresented in STEM and therefore that cannot explain the lack of replication. There are still many other reasons besides innate differences that might explain it.
Part of the context of that time was that _The Bell Curve_ had been published fairly recently and there was great desire to disprove it and anyone doing that could count on lots of attention and speaking fees. So the grift was to present stereotype threat as this grand solution that could resolve all racial differences.
> When Black students at Stanford University were told that a test was diagnostic of intellectual ability, they performed worse than their white counterparts. However, when this stereotype threat was ostensibly removed—by simply framing the test as a measure of problem-solving rather than intelligence—the performance gap Black and white students nearly vanished.
Just reading this description motivates me to reject the study out of hand. It's not plausible that university-level students responded meaningfully differently to being told "this is a test of problem-solving skill" versus "this is a test of intelligence" because it is commonly understood that problem-solving skill is a major component of intelligence.
>it also became the darling of the political left who now had an answer to prevailing views of group differences held by the political right. This is partly because shortly before stereotype threat took its turn in the spotlight, Charles Murray and Richard Herrnstein published The Bell Curve... the octogenarian Murray is still considered a pariah, shouted down and deplatformed from talks he tries to deliver at respectable colleges to this day.
The characterization of Murray's views in the last several years has been grossly uncharitable and seems entirely disconnected from his actual arguments. It's strange that the book is 30 years old, but has seemed politically relevant for much less time than that.
This part is definitely true. It has also been my experience -- but more so within academic areas with sloppy "researchers" and political problems. Linguistics is pretty good, gender "research" is awful. Intelligence research is pretty damn good, social "science" and psychology (outside of psychometrics) is awful. Economics is awful, apart from the basic ideas of competition, capitalism, and low taxes. Keynesianism is largely a fraud.
> My conclusion was that his book is junk.
This is unlikely to be true.
Even the replication attempt had two scenarios:
1 where women were told the test was to establish performance levels on the test between men and women
1 where they were told the test was a test of problem-solving skill (or primed to disregard negative stereotypes before the test).
So even the replication has the incorrect framing you worried about. I tend to believe the problem wasn't this, but the way the field was lax about sampling, methodology, etc. After all, there were many stereotype threat studies, not just this one, boasting similar results. And they didn't all use that framing.
The Stereotype Threat: "individuals who are part of a negatively stereotyped group can, in certain situations, experience anxiety about confirming those stereotypes, leading paradoxically to underperformance, thus confirming the disparaging stereotype." for example, if you remind a woman of the "Women are bad at math" stereotype, they will perform worse on a math test than if they are not reminded of that stereotype.
>Let’s play “Find the Lebowski quotes game” again!
So, yeah, I find this a deeply unserious blog post.
I'm assuming you are not aware of studies not mentioned here that replicate - I'm not in this field and so I would not know where to look. I'm guessing that you also are not in this field and are looking for some way to allow your bias to become true despite these issues - but of course I might be wrong.
So, thus far, there is actually not enough evidence to throw the, "Racism exists," baby out with the, "Do these interventions affect racist belief?" bathwater. And since we're second-guessing biases, it's super weird that you're always in comment sections of (politically-charged) articles concerning fields you're not in.
No, it does not make that claim.
If stereotype threat is real, we should be able to have a replicatable study result that confirms it, right? We're not just limited to logical inference, I hope.