How I met my collaborators

This short post depicts compressed history of three physicists' 14 years of collaboration for the research on the collaborative knowledge.

Dec 17, 2018
2
0

This article is about the authors' footages on my recent paper in my personal viewpoint: https://www.nature.com/articles/s41562-018-0488-z 

The story began about 14 years ago. I was lucky to be a freshman of Korea Advanced Institute of Science and Technology (KAIST), which is a research-oriented university that most Korean kids who want to be scientists/engineers eager to enroll. At that time, Sang Hoon Lee was a young graduate student and my first Teaching Assistant of the general physics course. Sometimes, professors gave me assignments to write a short essay in an academic style accompanying suitable references for the arguments, as homework. In 2005, Wikipedia was a rising star, yet their credibility was usually doubted. Thus, using Wikipedia as a reference was a sort of taboo.  As a TA, Sang Hoon also asked us not to rely on Wikipedia too much; of course, I also agreed with him. As many skeptics, we also disbelieved Wikipedia.

We were wrong.

According to, well, (what a twist!) my own research in collaboration with Sang Hoon after 14 years. Wikipedia has gradually turned out to be more trustworthy than we had ever expected.

A few years later after I met Sang Hoon, as luck would have it again, I joined Prof. Hawoong Jeong’s research group on complex systems in KAIST as a Ph.D. student. We mainly focused on the long term evolution of human knowledge, as many other scholars who have tried that topic for decades. However, due to the limitation of available data, we essentially observed the fruits of curated knowledge. The detailed progress behind the knowledge still had been remained hidden. Just after I wrote a paper about 209 years of word usage in English books with Google n-gram dataset (J Yun et al, PLOS ONE 2015), around the time when I started to think about finishing my Ph.D. course and my own future, my attention moved to the epidemiology due to the 2015 Middle East respiratory syndrome (MERS) outbreak in South Korea.

To study the epidemiology, I visited Sungkyunkwan University (SKKU). Unfortunately, right after I started to work on that problem, I could feel that the collaboration was about to hit the dead end (alas, it actually did), due to the privacy issue in the medical-record dataset. One consolation in sadness (and desperation about getting my Ph.D. degree and future) was I met Sang Hoon again who just returned to Korea and joined SKKU as a research professor, after four years of postdoc experience in Europe. We often discussed the evolution of collaborative knowledge (and other not necessarily academic things with some beer), which we finally admitted that we had misjudged a decade ago. Wikipedia and Wikimedia projects have served as a representative playground to share the knowledge in the twenty-first century. It has more than millions of articles, and more importantly, all of the articles have their collaboration history of the millions of editors; we now had a chance to trace the exact progress behind the knowledge formulation. Hawoong Jeong also expressed the interest on the topic. A new project of three statistical physicists began.

For a decade, researchers had mainly investigated Wikipedia as a social network, examining the interaction between editors in a specific timeframe, e.g., "edit wars". However, our interest was different from others. As statistical physicists, scaling laws and universality have always been "holy grail" for us. Naturally, we decided to find universalities in the Wikipedia (and its sibling Wikimedia projects). Obviously, the English Wikipedia, which is the largest one among them, was our first target.  In early 2014, the English Wikipedia contained 34 million items with 587 million editing events, and comparing such many articles directly was almost impossible.  We thus used physicists' solution: defining suitable measures for the analysis of growing articles. With the appropriate time rescaling, we found that whole articles are simply separated into four categories based on our new measures: the ratio between the number of editors, number of edits and text size for an article (J Yun et al, PRE 2016: https://journals.aps.org/pre/abstract/10.1103/PhysRevE.93.012307). An important finding was a relatively small number of editors impose the majority of influence on most articles; in other words, we witnessed the iron law of oligarchy.



Illustration describing our previous work (PRE 2016) in Danish newspaper Weekendavisen

After the initial success with a number of media attention (worldwide, in fact, as the figure above indicates) toward our work, we dreamed of the bigger future: finding pan-human scale universality among the human knowledge, in terms of the iron law of oligarchy. The Wikimedia research was conventionally focused on the English edition. Even though some other language editions of Wikipedia had been studied, it had been considered largest ones only, i.e., French, Spanish, and German; thus, smaller editions and other Wikimedia projects were commonly neglected. In addition, the commonalities and the differences between online collaborative knowledge (i.e. Wikimedia projects) and conventional collaborative knowledge (e.g. patents and scientific papers) were unexplored.

To fill the obvious gap in the research in this topic, and partly as researchers belonging to the non-western world and speaking the native language (Korean) other than those “popular” languages, we decided to collect the entire history of 863 different Wikimedia projects composed of the exhaustive list of languages and purposes, which contain more than 2.5 billion edits from 100 million editors. We also gathered the 20 years history of scientific publishing in SCOPUS and patent applications from worldwide patent offices, as requested by a considerate anonymous reviewer during the review process.

Surprisingly (or not so surprisingly, depending on the perspective), we found evidence for a universal growth pattern, not only for the online media but also for the patents and scientific papers from the extensive data sets. The growth of system slows down as more edits are performed. Moreover, we again observed the severe inequality in contribution among the Wikimedia editors, which is established early in all Wikimedia projects and continues until now. The inequality also exists for the papers and patents, yet it is not as severe as the Wikimedia projects: eventually, we discovered that Wikipedia is not a utopia we dreamed. We also developed a simple model that elucidates the mechanism behind the oligopoly. According to the model and data, new editors cannot survive in Wikipedia in current states, and we need some effort to sustain fertile collaborative environments.

In short, this research is the result of 14-year-long consideration on the human knowledge of me and my collaborators. The research provides important insights into human knowledge formation. Yet, there are open questions for further research, and I hope that someone will continue to work on this topic, hopefully including ourselves. This direction of research is as fundamental as possible, as long as the human civilization continues with the foundation built by knowledge formation. Long live the human society!

Jinhyuk Yun

Senior Research Scientist, KISTI

I am a data scientist at Korea Institute of Science and Technology Information. I am interested in the structure and dynamics of complex systems such as society, culture, media and collective knowledge. For the purpose, I generally handle massive datasets and detect the hidden patterns beneath external appearance. Currently I am working on: Modeling Human Behavior, Network Science, Data Science, Computational Social Science, Cultural Science, Scientometrics, Citation Dynamics and so on.

No comments yet.