The Science of COVID-19

As the coronavirus is spreading across the globe it has also inspired a great number of research projects aiming to either find the cure or ease the many burdens caused by the pandemics. Here I use network and data science to briefly overview some of these research directions.

Like Comment

There have been more than 700 research articles, comments and opinions, and reviews published solely by Springer Nature during the past few months related to COVID-19, while adding all the preprints from the arXiv, medRxiv, and bioRxiv makes it to more than five thousand articles during less than four months from late January up until the 19th of May. This means the upload of roughly 43 new pieces each day. While the acceleration of the publication rate seems to have stopped for the preprint portals, the daily number of published works is still growing at Springer’s, probably signaling that the preprints slowly make it through the peer reviews (Figure 1).

Figure 1.
Figure 1. The number of articles posted by Springer Nature, medRxiv and bioRxiv combined, and the arXiv. The graph shows the averaged trends using a moving window of 4 days.

The three data-sources are slightly corresponding to different (overlapping) scientific communities, as the arXiv is more popular in the physics and computer science disciplines, while life science research is rather mostly present on the medRxiv and the bioRxiv. Further differences may arise from the fact that while these two sites are collecting preprints, Springer Nature is a selection of published pieces. 

Next, I analyzed the differences between these sources by comparing the topics, for instance, based on the articles’ titles. When comparing the 20 most frequent keywords, obtained from the article titles by dropping stop-words and doing lemmatization, across the three data sources is turns out that the most unique terms for the arXiv are:

 'network', 'data', 'spread', 'dynamic', 'learn', 'social'

 While the medRxiv and bioRxiv can be characterized by:

 'study', 'transmission', 'clinical', 'china', 'novel'

 And the Springer Nature articles by:

<span></span>'science', 'cell', 'new', 'research', 'human', 'daily', 'time', 'cancer', 'vaccine', 'test', 'briefing', 'drug'

These keywords tell us that the preprints of arXiv mostly focus on specific topics related to data-driven modeling, such as social dynamics and network spreading, while medRxiv and bioRxiv showcase more generic work on clinical research. The top keywords of Springer Nature are more centered around drug-research and potential vaccines, and also focus on news and updates towards the community.

By constructing the co-occurrence network of the top 100 keywords from each source we can gain further insights into the trending COVID-19 research topics (Figure 2). From the topic-network, further directions emerge, such as human mobility and contact research, deep learning-based image recognition’s applications in CT, policy and health systems, various mechanisms behind the virus itself, such as its transmission properties and typical receptor proteins, and more detailed descriptions on COVID-19 and SARS.

 Figure 2.
Figure 2. The co-occurrence network of the article-keywords. To construct this network I collected the titles of the more than five thousand articles, used text processing tools to extract their keywords, and picked the 100 most frequent ones from each dataset. Each node represents one keyword, while the link between two keywords shows how often do they co-occur in the same titles. The size of the nodes is proportional to the total number of mentions each keyword has.

Milan Janosov

PhD in Network Science, Central European University

Physicist, network scientist

No comments yet.