Computational reproducibility matters

In August, Nature Human Behaviour introduced code review for submissions based on computational modelling and new software. Why does computational reproducibility matter, and what steps are we taking to ensure code reliability?

Like
Read the paper

Nature Human Behaviour's August Editorial announced our new code peer review policy, which will apply to all submissions relying on computational modelling and new software. The journal's move towards systematic code reproducibility checks means that the authors of computational papers will have to follow stricter guidelines of code preparation, storage, and documentation. Editors will ensure that the code is appropriately documented, stored, licensed, and cited. At least one of our reviewers will have the appropriate computational background to verify that the code runs as described, and that it reproduces the findings reported in the study. 

Our decision to take a more active role in promoting computational reproducibility comes at the time when more and more studies in social and behavioural sciences rely on mathematical models, computer simulations, and custom-built software tools. Over the last year alone, hundreds of computational studies modelled and simulated interactions between human behaviour, mobility, social distancing strategies, and the epidemiological dynamics of COVID-19; their predictions informed government responses and policy decisions. Some of the most significant theoretical breakthroughs of the last decade in economics, sociology, evolution of behaviour, machine learning, and network science relied on computational tools.

But the code behind the main findings of these computational studies is not always available as part of the publication record; when it is, the readers, reviewers, and editors do not always know whether the code is reliable, and whether it actually produces the reported outcomes. The practices surrounding computational modelling in fields relevant to human behaviour seem to have evolved in ways that make verification and reproducibility difficult, without strong incentives to code with reproducibility and transparency in mind. There are no universal community standards governing code readability, documentation, and storage. 

Code review conducted by a subset of Nature Research journals - and similar initiatives in other behavioural science communities - aims to change this culture, moving towards a more universal standard that recognizes that code and appropriate descriptions of computational workflows are as important to the scientific progress as any other part of the published paper.

Different views of code review

The truth is, the field of computational modelling is not immune to the same reproducibility issues that have plagued experimental disciplines for decades. The authors of a 2018 study investigating reproducibility of findings published in a specialist computational journal could not easily reproduce the results reported in 67% of articles. Having a background in computational modelling, I know all too well that researchers often struggle to replicate their own computational findings.

Over the last few months, preparing to unroll our new policies geared towards ensuring computational reproducibility, I had a chance to speak to computational social scientists, analytical sociologists, and other researchers whose work relies on developing custom computer code. All agreed that the field stands to benefit from adopting stricter standards of code review and publication. But I also noticed differences in their views and interpretations of the computational reproducibility issue, and the role that peer review may play. Given their diverse research backgrounds and different norms in their respective fields, this did not come as a big surprise.

A sociologist respondent said that the current trust-based system has traditionally worked well in their field, and that good textual and mathematical model descriptions already ensure a level of reproducibility. Others pointed to computational studies that had to be retracted following the discovery of fundamental coding issues that completely invalidated the main claims, and they said that their own modelling work would have benefited from independent code verification. Some computational social scientists thought that code review will offer an additional mechanism encouraging researchers to write more readable code, provide better documentation, and that it will increase the readers' trust in pure modelling studies.

There were also scientists who were concerned about how demanding and time consuming code peer review may be, given their already overcrowded schedules. Even though all of them use code in their work, some were not confident whether they have the necessary skills to review others' code.

What our code review is not

There is a difference between code review that independently reproduces computational steps and verifies the reported outcomes, and other types of quality control measures that are more common in software engineering, such as debugging, optimization, and scrutinizing algorithmic accuracy.

All types of quality assurance are important. Ideally, other researchers should also be able to reproduce computational findings from scratch, based on textual or mathematical descriptions provided in the manuscript. But the first step towards making sure that code can be published as a reliable part of the publication record is simple verification of functionality and reproduction of the reported results.

And so, our new code review policies do not mandate that reviewers perform a comprehensive line-by-line reading of the source code, or that they scrutinize performance, authors' algorithmic decisions or code syntax. Our code review is not debugging, optimization, nor function testing. Having full access to the source code, and the relevant computational background, reviewers are welcome to provide feedback regarding specific function tests, algorithms, and to look for potential errors, but ultimately these are author responsibilities.

Our new computational reproducibility policies apply to submissions in which a computational model itself is the main scientific contribution. These are the studies based on agent-based simulations, machine learning tools, population-genetic and social-network models, among others. We will not be asking our reviewers to verify code when it is only used for data processing, statistical analyses, and other standard procedures that are not themselves central to the study, and we will only ask experts in relevant computational modelling frameworks - not all reviewers - to verify the code. 

Best practices

There is a lot that the authors can do to prepare their code for reproducibility checks, from providing appropriate documentation to maintaining user-friendly code structure.

Alongside their data files and source code, we require our authors to include a README file. README is a key document explaining how to use the code to reproduce the results reported in the paper. It should explain what is included in the code submission, provide an installation guide, system requirements, list external dependencies, and describe parameters. It should include a detailed guide with all steps that the reader needs to follow to exactly reproduce the reported outcomes. There is currently no standard structure that a README file should follow, but data editors of several economics  journals maintain a useful README template that can serve as an example of how detailed a README file should be.

The way authors write and organize their code also matters. If the code has external dependencies that cannot be provided with the submission, the authors may want to include setup scripts that automatically install them. A reviewer should not have to edit any of the source files to replicate the findings, even when the output depends on user-supplied parameters. The code should ideally be organized in a way that outputs the final results as reported in the paper, for instance, entire figures, and not intermediate results. 

Our editorial role

One of the editorial missions of Nature Human Behaviour is to promote transparent and reliable research practices in behavioral and social sciences. We do this by publishing registered reports, systematic reviews, and important null results, by supporting metascience projects and encouraging replication studies.

Now, these efforts also include our obligation to hold authors to the stricter standards of computational reproducibility, and to support the authors in preparing their code for submission and review.

Even before the computational paper is sent out to review, we will work with the authors to ensure that the code is appropriately stored and documented, and that it is presented in the form suitable for replicability checks. Reviewers performing computational reproducibility checks will be experts in both computational modelling and the main field of research, and they will already have the technical expertise required to review the code alongside the rest of the paper. At acceptance, we will again guide the authors through code publication, ensuring that the code is appropriately stored, licensed, and cited.


[Poster image credit: 86thStreet: Chuck Close, Subway Portraits, MTA, CC BY 2.0]

Arunas Radzvilavicius

Editor, Springer Nature

Editor, Nature Human Behaviour.

Comments are disabled