We are living in a world teeming with online information. Every day, we create, read and disseminate online contents, which collectively form a virtual network of information diffusion. To me as an infectious disease modeler, the idea of using epidemic models to simulate information diffusion is irresistible. Indeed, epidemic-like models have been routinely used and applied in numerous studies. However, recent empirical analyses as well as ours in this paper (1–4), suggest that naive epidemic-like models typically fail to generate the structure of realistic diffusion trees - deep cascades that frequently appear in epidemic models were rarely observed in online social platforms.
My quest for resolving this discrepancy began from an analysis on diffusion patterns in a blog-sharing community, LiveJournal, back in 2015 (4). After my PhD graduation, this project was put on the shelf for a couple of years, until one day in 2018 when I was contacted by Dr. Bin Zhou, then a visiting scholar at Boston University, inquiring about details of the LiveJournal study. He shared with me several promising findings on peer-to-peer diffusion patterns he already discovered in two other social networks, which I thought could be key to improving information diffusion models. My curiosity was re-ignited and we quickly assembled an international and interdisciplinary team to tackle this problem, aiming to develop a more realistic cascade model that applies to a variety of social platforms.
Information diffusion is fundamentally different from epidemic spread in many ways. In contrast to epidemic processes in which exposures to infection result in passive transmission, social “contagion” is a deliberate action taken by individuals who receive information, involving a number of factors. To begin with, a more granular understanding of how information spreads from person to person is essential for the development of realistic diffusion models.
As a first step, we set about differentiating the propensity of information diffusion following social ties based on the connectedness of information disseminators and receivers. We analyzed comprehensive diffusion records and associated social networks in three distinct online social platforms – a blog-sharing community and two microblogging services, and found that the diffusion probability along a social tie follows a power-law relationship with the numbers of disseminator’s followers and receiver’s followees. Interestingly, we found that the effectiveness of information disseminators to spread information drops with their degree (i.e., number of connections) consistently across three platforms. In contrast, the dependence of responsiveness of information receivers on their degree differs for the blog-sharing community and microblogging services, possibly driven by distinct social mechanisms and user behaviors.
We integrated this finding into a cascade model. To our disappointment, the structure of simulated diffusion trees is still far from real-world observations. This disagreement puzzled me for a while. The eureka moment came when I realized that the observed diffusion events are highly biased to the tweets or posts that were successfully disseminated, leading to a significantly overestimated diffusion probability. This observational bias echoes similar issues in my studies on influenza forecasting (5), where disease surveillance is biased to individuals with more severe symptoms. In disease forecasting, a Bayesian method was employed to correct such bias (6); here, we developed a similar method to adjust for biases and demonstrate that the adjusted models are capable of reproducing key structural features of observed diffusion trees across the three platforms.
Simulating information diffusion using generative models is notoriously difficult due to the overwhelming complexity involved in social contagion. Our study provides an improved model for information diffusion, but it is far from perfect. More studies are needed to advance our ability to simulate information spread using better-tailored process-based models.
Link to the paper: https://www.nature.com/articles/s41562-020-00945-1
- Liben-Nowell, D. & Kleinberg, J. Tracing information flow on a global scale using Internet chain-letter data. Proc Natl Acad Sci USA 105, 4633–4638 (2008).
- Goel, S., Anderson, A., Hofman, J. & Watts, D. J. The Structural Virality of Online Diffusion. Management Science62, 180–196 (2015).
- Goel, S., Watts, D. J. & Goldstein, D. G. The structure of online diffusion networks. in Proceedings of the 13th ACM Conference on Electronic Commerce 623–638 (Association for Computing Machinery, 2012). doi:10.1145/2229012.2229058.
- Pei, S., Muchnik, L., Tang, S., Zheng, Z. & Makse, H. A. Exploring the Complex Pattern of Information Spreading in Online Blog Communities. PLOS ONE 10, e0126894 (2015).
- Pei, S., Kandula, S., Yang, W. & Shaman, J. Forecasting the spatial transmission of influenza in the United States. Proc Natl Acad Sci USA 115, 2752–2757 (2018).
- Shaman, J., Karspeck, A., Yang, W., Tamerius, J. & Lipsitch, M. Real-time influenza forecasts during the 2012–2013 season. Nature Communications 4, 2837 (2013).