I4R Discussion Paper Series #124


Douglas Campbell (New Economic School), Abel Brodeur, Anna Dreber, Magnus Johannesson, Joseph Kopecky (Trinity College Dublin), Lester Lusher (University of Pittsburgh, IZA), Nikita Tsoy (INSAIT, Sofia University)

The Robustness Reproducibility of the American Economic Review

We estimate the robustness reproducibility of key results from 17 non-experimental AER papers published in 2013 (8 papers) and 2022/23 (9 papers). We find that many of the results are not robust, with no improvement over time. The fraction of significant robustness tests (p<0.05) varies between 17% and 88% across the papers with a mean of 46%. The mean relative t/z-value of the robustness tests varies between 35% and 87% with a mean of 63%, suggesting selective reporting of analytical specifications that exaggerate statistical significance. A sample of economists (n=359) overestimates robustness reproducibility, but predictions are correlated with observed reproducibility.