Zum Hauptinhalt springen

I4R Discussion Paper Series #8


Abel Brodeur (University of Ottawa), Nikolai Cook (Wilfrid Laurier University Waterloo), Anthony Heyes (University of Birmingham)

We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell us about 𝒑-Hacking and Publication Bias in Online Experiments

Amazon’s Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erodes substantially the credibility of these studies’ conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.