Behind CoronaNet: How we built our dataset

By: Cindy Cheng, Luca Messerschmidt, Joan Barceló, Allison Spencer Hartnett, Vanja Grujic, Robert Kubinec, Timothy Model, and Caress Schenk

Like Comment
Read the paper

Like most people, at the beginning of the year, we had read about the SARS-CoV-2 outbreak in China with abstract concern, then, with increasing unease as cases sprang up outside China. The cold gravity of the situation finally hit home when the World Health Organization (WHO) declared a global pandemic on March 11. Governments everywhere began implementing a series of policies unimaginable only a few weeks before: on March 19, the state  of Bavaria (Germany) instituted a lockdown shortly after the United Arab Emirates had already closed all schools by March 9. 

In those early days, we saw our travel plans evaporate as borders closed and case counts went up. We coped with the uncertainty of the nature and effects of the virus by obsessively reading variations of the same vague reports, accomplishing little but wracked nerves and a sense of helplessness. In the vacuum of clear guidance from the WHO as to what policies were effective against COVID-19 or not, Robert Kubinec and Cindy Cheng independently started collecting data on government responses to COVID-19 and having learned of each other’s efforts through what is now the COVID-19 Social Science Research Tracker, made contact on Monday, March 23. 

What happened next was the seemingly improbable takeoff of the CoronaNet Research Project. This academic collaboration has produced a database of, at the time of writing, more than 45,000 policies documenting government responses to the COVID-19 pandemic, an article in Nature Human Behaviour describing the effort and numerous working papers using the data. No week on the project has been more head-spinning than its first. On Tuesday, March 24, both Joan Barceló and Luca Messerschmidt had signed on to the effort, Cindy drafted the codebook by Wednesday, Robert began training thirty-odd research assistants (RAs) on Thursday, and we launched our data collection effort via a Qualtrics survey by Saturday. By the first week of April, Allison Hartnett had also joined the effort to guide and manage our data validation efforts. Since then, we have welcomed three new co-PIs who have played invaluable roles in ensuring the success of the project; Tim Model has professionalized our data pipeline and Caress Schenk and Vanja Grujic have bolstered our RA training and management.

From the start, we faced immense organizational challenges and we dealt with them by building streamlined internal structures and institutions to manage them. The three main challenges were, and continue to be: recruitment, training and motivation of RAs. With regards to recruitment, a week after the launch of the survey, 30 RAs became 210 and we have since held steady at around 500 active RAs spanning 18 different time zones. Though at first, Luca bore the brunt of this logistical challenge, by early May, we developed a project management team to deal with the recruitment and onboarding of new RAs.  

Training different people from all over the world to document policies in a consistent way has presented its own challenge. Initially, we required that all RAs watch a recording of our original training video and use a centralized platform, Slack, for communication. We built on this foundation by developing additional tools and mechanisms to support and train the RAs, including more comprehensive training materials, a Shiny App to help RAs visualize the data and a training assessment to evaluate RA skills. 
 
Motivating RAs to continue with the project as volunteers is perhaps the biggest challenge of all.  While we continuously search for funding opportunities to compensate as many RAs as we can, we deal with the immediate reality of working with volunteers in a number of ways. Early on, we had organized a select group of RAs to help monitor general well-being and ensure efficient communication and feedback. By late May, we further instituted a system of regional and country managers, which has proven to be instrumental in supporting, monitoring and managing the work of RAs. To motivate and empower research assistants to not only build their data collection skills but to also develop their own research skills, we encourage our junior scholars to leverage our data in their research by providing them with workshops and technical skills training (e.g. in R and statistics to do so). We also provide a platform for disseminating RA research on our website and a forthcoming working paper series.
 
In the midst of these efforts, we wrote up a paper about our collective data collection efforts and in the blink of an academic eye, published it in June. Publication is just the start. The project has continued to move forward, and is ever-evolving toward greater collaboration, communication and community, both externally and internally. 

Externally, we have received and continue to receive incalculable support from our home institutions, both in terms of material funding as well as support, advice, and counseling from our academic betters and peers. Our ability to identify raw policies to code was significantly bolstered by cooperation with first Jataware, and then Overton, both of which make use of machine-learning algorithms to find, respectively, news reports (Jataware) and government policies (Overton) related to COVID-19. Our most significant collaboration to date has been our participation in PERISCOPE, an academic consortium of 32 EU universities investigating the behavioural and socio-economic impacts of COVID-19. Funded by an EU Horizon 2020 grant, it will keep CoronaNet's data collection effort going for three more years.  In the future, we aim to further strengthen international collaboration and coordination with other projects which gather information on COVID-19 policies because we recognize that all of us not only face the same issues --- recruiting, training and motivating RAs --- but are driven by the same spirit --- providing a public dataset which can help forward research and knowledge of the COVID-19 pandemic.
 
Internally, the goal has always been to come to a common understanding of how a policy should be coded. Doing so depends on the ability to pass on the specific knowledge of coding policies in a particular country as well as the general knowledge of how the project has evolved overall. 
 
Though early on, we often compared our RAs to an army of coders to explain how the data was being gathered, we have come to appreciate that such metaphors miss the mark. While the project does depend on guiding an enormous number of diligent and public-spirited RAs to document government policies all over the world in a standardized way, organizing this work requires a great deal of flexibility and bottom-up communication. Indeed, though ultimately only one RA is responsible for coding a given policy, preparing the way to do so takes a community of scholars working together to assign and distribute the work, give advice on how to code different policies across countries and share ideas and information about new policies on the horizon. Everybody plays an important part in ensuring that there is a common understanding of how to code a policy and thus to the success of the project. With the end of the pandemic still in the distance, we anticipate further challenges ahead but are confident in the robustness of our community to handle them and invite anyone so inclined to join our efforts. 

Cindy Cheng

Postdoctoral Research Fellow, Hochschule für Politik, Technical University of Munich