Solving the replication crisis by replicating less
Scientific findings must be replicable. But what exactly should be replicated? We model a scientific community that maximizes knowledge acquisition if findings are not routinely replicated before publication, but only those published findings are replicated that the community finds interesting.
Only half of all published findings in cognitive psychology replicate, and the figure is even lower in social psychology. Similar problems have been observed in other disciplines including cancer research. Given that reproducibility of phenomena is a fundamental attribute of science, the massive concern within the scientific community over the “replication crisis” is unsurprising.
Numerous suggestions have been advanced to enhance replicability, ranging from increasing statistical power through preregistration of methods and analysis to improved theorizing. Underlying all those suggestions is a strong commitment to making replications mainstream, as a recent target article in Behavioral and Brain Sciences argued very eloquently.
However, to date there has been relatively less emphasis on exploring the circumstances under which replications are or are not advisable. Should every study be replicated? If so, by whom? Or should replication be reserved for findings that are important enough to warrant the effort?
Klaus Oberauer and I examined those issues in an article that just appeared in Nature Communications. We modeled a hypothetical scientific community and compared various replication regimes, seeking a regime that maximized the accumulation of true knowledge. The take-home message of our modeling is that the publication of potentially non-replicable studies minimizes cost and maximizes efficiency of knowledge gain for the scientific community under a broad range of assumptions.
At first glance, this conclusion may appear counterintuitive. However, we show that this conclusion is remarkably robust and an inevitable consequence of one crucial assumption; namely, that the scientific community does not find all published results to be of equal interest.
Scientists publish their findings because they find them interesting and exciting. And scientists always hope that their findings will have an impact and will be taken up by the scientific community. Alas, most findings leave hardly a trace and receive few (or no) mentions on Twitter or citations in the literature. Only a few findings strike a sweet spot in the scientific community and everyone starts talking about them—at conferences, on Twitter, or by citing the findings in the literature.
Our model instantiates this lopsided distribution of interest by considering the actual citation counts that articles in psychology tend to receive: few articles are widely cited and most articles receive just a few mentions. The moment one accepts this skewed distribution—and it absolutely does not matter whether one uses citations or some other measure of interest—the question of who should replicate findings, and when, becomes crucial.
In our model, we explored two replication regimes: Under one regime, which we call “private”, all findings are replicated by the author before publication to guard against subsequent replication failures. Findings that do not replicate are not published. Under an alternative regime, which we call “public”, all significant findings are published and hence their replicability is uncertain upon publication. Published studies are replicated only when the scientific community considers them to be interesting—and as we just noted, few findings are considered interesting and so few published findings are replicated.
When we compare those two regimes, one pervasive finding emerges: The private replication regime incurs a considerably greater cost, in terms of total number of experiments conducted, than the public replication regime to uncover the same new knowledge.
Every replication of a study that no one ultimately cares about represents an opportunity cost. Effort that could have been put towards a more productive purpose has been wasted. Given that waste of resources has been identified as a major adverse consequence of the replication crisis, our results caution against routine prepublication replication of new findings.
Our finding holds regardless of whether one uses conventional frequentist statistical methods or Bayesian statistics, it holds if only positive findings are considered as well as when null results are also considered interesting, and it applies to theory-guided research as much as it applies to discovery-oriented (exploratory) research. Perhaps ironically, the scientific community overall accumulates more knowledge over time if findings are published before their replicability has been established.
Our model nonetheless heeds calls for a strong “replication culture”: By freeing up resources, adoption of our model would facilitate powerful, large-scale replications that are particularly resource intensive. Our model is also compatible with the overwhelming expert opinion that replications are more useful if they are conducted by someone other than the original author.
There are, however, some concerns that arise from the public replication regime in our model. First, the model has nothing to say about the distribution of workload. The public replication regime is demonstrably more efficient overall, but it is unclear who benefits from the savings. Is it junior post-docs? Senior professors? Students? Are the savings distributed evenly across gender?
Second, and perhaps most concerning, published studies will continue to fail to replicate, which at first glance rather calls into question the wisdom of the public replication regime. Successful adoption of our model therefore also requires a reform of the current publication regime. Our article therefore contains the following recommendations: “We suggest that the public-replication regime can live up to its promise if (1) non-replicated findings are published provisionally and with an embargo (e.g., 1 year) against media coverage or citation. (2) Provisional publications are accompanied by an invitation for replication by other researchers. (3) If the replication is successful, the replicators become co-authors, and an archival publication of record replaces the provisional version. (4) Replication failure leads to a public withdrawal of the provisional publication accompanied by a citable public acknowledgement of the replicators. This ensures that replication failures are known, thus eliminating publication bias. (5) If no one seeks to replicate a provisional finding, the original publication becomes archival after the embargo expires with a note that it did not attract interest in replication. This status can still change into (3) or (4) if a replication is undertaken later.”
Solving the replication crisis requires a cultural shift and requires that well-targeted replications become mainstream. But routine replication before publication, perhaps paradoxically, is not the best solution.