Please tell us about your research interests.
I study the structure of social and behavioral data, working to understand the assumptions that underlie different models of such data. The majority of my work has focused on studying social network data, collected either explicitly or implicitly through the widespread instrumentation of our modern digital lives. Through this work I try to understand how the structure of empirical social networks shapes the behavior of social algorithms — algorithms such as recommendation engines that mediate interactions between people — and how to adapt the design of such algorithms to achieve various goals. I’ve been very fortunate that interest in these sorts of questions has greatly increased since I started working on them at the onset of my Ph.D. a decade ago.
One of my favorite research questions is: when studying the structure of a social network, what is “social" about it? Some observations about social networks are actually true for the vast majority of graphs (how mathematicians refer to networks) from some non-specific family. For example, the “friendship paradox,” the phenomenon attributed to sociologist Scott Feld whereby “your friends have more friends than you,” is more a mathematical property of graphs rather than some sort of social phenomenon. The friendship paradox will arise in any social network, whether it be derived from communication data or proximity data or an explicit questionnaire about friendship, as long as there are at least two people who differ in their number of connections (mathematically: if there is non-zero variance is the degree sequence). So the friendship paradox is not a particularly social phenomena in and of itself.
My own work on this question (what is “social”?) has included research on the careful application of statistical null models of network data, also known as random graph models, to test social hypotheses. If we want to test a particular hypothesis about social interaction — for example, that birds preferentially interact with birds of the same plumage color — how can we test the significance of that claim based on our data? Basically, an assortativity statistic is computed from our data, and then a random graph model can be used to sample graphs whereby birds interact freely, regardless of plumage, but keeping other properties fixed. The statistic for the observed data can then be compared to the distribution of the statistic for the sampled networks. When Jacob Moreno and Helen Jennings were developing sociometrics way back in the 1930s, they actually developed a process of sampling a so-called “chance sociogram” by drawing numbers out of a hat to rewire their network data in particular ways, a technique that captures the essential idea very well. My recent work in this area has been to mathematically formalize the assumptions that underlie different ways of rewiring such network data so researchers using such tools can better understand the assumptions that are implicit in this suite of analysis methods.
As a separate thrust of my research, I study how to run and interpret large-scale experiments (randomized trials) in networked settings. This work has a particular focus on the experimental design of algorithms in real-world social systems. Standard methods for causal inference break down in the presence of network interference because the so-called ``stable unit treatment value assumption'' (SUTVA) is violated. My work in this area began as part of my Ph.D. thesis, which developed "graph cluster randomization" as a method for running experiments to estimate treatment effects under interference. Network experiments are very tricky to design and analyze and the subject of some very exciting research the last several years.
What has your journey been to this point?
Circuitous! As an adolescent I was very bent towards science and engineering, seeking to understand the natural and built world around me. For high school I attended an outstanding public science magnet school in New Jersey and I’ve always been very grateful for that opportunity. It also did an excellent job of showing me that there was more to the world than just integrals and integrated circuits. After high school I enrolled at Deep Springs College, a two-year liberal arts program in the California high desert where my studies focused on literature and philosophy more than anything else. While most of my classmates transferred out of Deep Springs to finish humanities degrees elsewhere, I chose to return to technical topics. My parents had recently moved back to Sweden, where they’d emigrated from 25 years prior, and chose to move with them and enrolled to study applied physics at Lund University in southern Sweden.
The first year of the applied physics program was almost entirely math, which I enjoyed a great deal. When it came time to study the physics that applied this math, I quickly realized that I didn’t really enjoy the physics part. So I changed majors to applied math, with an eventual focus on control theory. I wrote my undergraduate thesis on mathematical models of how oscillatory genetic regulatory networks utilized (or could utilize, in the spirit of synthetic biology) temporal delays to increase the robustness of their oscillations. When giving a presentation about that work, a biology professor who I greatly admired told me, paraphrasing, “that’s all very interesting, but that’s not how cells do it”. Missing from my mathematical toolkit was a range of enzymes that effectively achieved much of what I was trying to do with my modeling. I realized that if I wanted to continue working in mathematical biology, I needed to learn much more biology! Certain that I wanted to go to graduate school but feeling very uncertain of what I wanted to study, I enrolled in the one-year “Part III" math masters program at Cambridge.
My first week of classes in England was at the height of the 2008 financial crisis. I opened a bank account at Lloyds the same day the UK bank rescue package was announced. I remember the teller next to me handling a customer who was redistributing his funds across several banks to make sure all his accounts fell under the guarantees of the FSCS (the UK version of FDIC). That year I ended up taking mostly courses in probability and statistics, and I became very interested in ideas of "systemic risk” and more broadly the study of incentives in interdependent technical systems (a hot topic at the intersection of computer science and economics). This interest led me to pursue a Ph.D. (in applied math) at Cornell, which I’d been told (rightly!) had wonderful faculty working in that area.
I remember taking graduate microeconomics my first semester at Cornell (from David Easley) and feeling, for the first time — I had barely studied economics in undergrad — that there were these super interesting economics questions that were layered on top of the mathematical language of the lecture. This experience contrasted very sharply with my lackluster enthusiasm for physics and biology questions in previous stages of my journey. I am enormously grateful to Jon Kleinberg, my thesis advisor at Cornell, for his generous mentorship and support as I developed my research interests under his guidance.
In retrospect, one of the more fortuitous events of my career was when I was rejected from the 2010 summer program at the Santa Fe Institute (SFI) for my first summer as a Ph.D. student. Disappointed, I scrambled to find something the fill the summer. Jon and I had been emailing with Lars Backstrom, his recently graduated Ph.D. advisee who had gone to work at Facebook in 2009, and he suggested that Facebook was open to hosting Ph.D. interns looking to do publishable research at the company. I went on to spend the summers of 2010, 2011, and 2012 at Facebook working with outstanding researchers there doing super exciting work (including papers on “degrees of separation” and the social decision-making involved in the adoption of Facebook itself), and it is undoubtedly this work that I am most known for at this point in my career. I continue to draw a great deal of inspiration from collaborations with colleagues in industry.
I feel super lucky to have received an offer to join the Stanford faculty in 2015. The department I’m in, Management Science & Engineering, is genuinely ideal for my research interests because of its immense breadth: I feel very lucky to be flanked by outstanding social scientists as well as equally outstanding probabilists and optimizers. Stanford has an incredible history of supporting methodological research towards social-scientific applications; I was recently looking for an old paper that wasn’t online, which led me to check out of the library the aged volume "Proceedings of the First Stanford Symposium on Mathematical Methods in the Social Sciences.” That symposium was hosted in 1959 and the volume edited by Kenneth Arrow, Samuel Karlin, and Patrick Suppes. These days there’s a vibrant community of scholars on campus across many departments interested in “computational social science”, and I consider myself very fortunate to benefit from the network effects of having such amazing colleagues that share my specific research interests.
Can you speak to any challenges that you had to overcome?
I have been enormously fortunate in the opportunities I have been afforded and the mentors who have guided me through the above journey, but the above narrative certainly omits many challenges. I think it’s very important to talk about the struggles different people have, as academic careers often demand that one climb a very narrow ridge line, with many more natural pathways leading off to one side or the other. If one wants to climb the academic ridge line, success is highly dependent upon sufficient support in hard times, much more so than the equally fulfilling career and research opportunities in industry and elsewhere. I often think about all the would-be academic colleagues who ended up on other pathways. I think it’s important that we academics try to make academia as welcoming as possible to individuals who aren’t as “lucky” with their opportunities and adversities, so an academic pathway is as available as possible to top students that seek one.
My main challenge came in 2006, at the start of my third year in Lund, when my mother was diagnosed with cancer. She barely survived the initial illness that brought about the diagnosis, and I spent much of the fall of 2006 at my parents house outside Stockholm, keeping up with schoolwork as I was able. I forewent trying to do anything professionally "productive” that next summer, opting to spend it at home with my family. But with my mother doing relatively well, I became increasingly unsure how to divide my time between family and personal development. After many conversations with my family about how “this could go on for years”, I arranged (after much angling) to spend the spring of my fourth year abroad at Caltech, nine time zones away. Just as I was planning to leave, my mother had a relapse. I spent less than two weeks in Pasadena before I had to fly back, arriving at her bedside two days before she passed away.
All the while that past autumn was when I was “supposed to” apply to Ph.D. programs. I had (understandably) been too paralyzed by everything that was going on to assemble any applications. As a result, in March 2008, my life was in disarray as graduation loomed. At the last minute, I found out about the Part III program at Cambridge, which had a deadline in late spring, and I applied. I was fortunately accepted, and was even able to line up a scholarship from a Swedish fund that had a summer deadline.
I gradually figured out how to navigate my new life. I figured out what I wanted to study, applied to several schools, was accepted at Cornell, and the journey continued. Counterfactuals are always hard, but it’s very true that I don’t know what trajectory my life would have taken if I wasn’t accepted to the Cambridge program, or if I hadn’t had such supportive mentors and family to help me keep pursuing my goals in those difficult times.
These experiences have made me enormously sympathetic towards students, both undergraduate and graduate, who struggle when “life interferes” with their academic or professional pursuits. I try to be the best mentor I can be on this front, aiming to be particularly supportive of the varied ways in which different people process different challenges.
What advice would you give your younger self?
My main advice to my younger self would be very concrete: get experience gathering your own data. No matter how abstractly you’re thinking about a data context, ground-level experience with data collection in that context is almost mandatory to do good work.
As an undergraduate studying applied math, I was essentially trained studying “other people’s data”: lab exercises centered around building experience around the capabilities and limitations of methods as applied to realistic data, but there was seldom any actual data collection involved. Even in my grad school coursework, clean datasets were almost always provided. As a result, I was relatively late to be introduced to the invariable messiness of real-world data contexts, the nuances of which are so central to methodological research.
For me, detailed experience with data collection came when deploying experiments at Facebook, where I came to understand the bizarre ways in which bugs or other instrumentational misconfigurations can easily skew experimental results. This experience has made me a much more careful consumer of other people’s data — I learned what to look out for — and also made me a much better collaborator, reviewer, advisor, and instructor. In digital contexts, significant interesting findings are so often due to bugs. Beware of confirmation bias in both directions, of course! But the essential point of my advice is that I think I would have matured earlier as a researcher if I'd engaged with messy data collection sooner.
What are your predictions for your field in the near future?
The modern area of "computational social science" largely emerged out of the opportunities created by the widespread instrumentation of digital behavior. A simplistic summary of these opportunities since about 2000 are that these datasets — starting with the structure of the world wide web, email datasets, and search engine query log datasets — went through a hugely impactful period of observational analysis. Call this period the “big data” period, if you will. All of the sudden a new generation of scholars had data — not necessarily good data, but data — that spoke to long-standing big social science questions, and these opportunities spawned a flurry of really exciting work. Around the time that I was starting my Ph.D. in 2009, however, the short-comings of such approaches were beginning to catch up with the opportunity value of the data. See e.g. Shalizi and Thomas’s work on social influence and homophily.
Following that period, over the last decade there has been a quick shift in the literature from “big data” to “big experiments,” noticing that most digital contexts that initially were prized for their instrumentation opportunities also afford opportunities for careful randomized experiments. Today partnering with even a modest digital platform can provide access to hundreds of thousands of potential research subjects in carefully controlled ways. Much of the current experimental work on digital platforms involves fairly basic research designs, and to me, as a methodologist, the exciting opportunities ahead are specifically the exploration of more ambitious designs to try and answer long-standing basic social science questions. The design and analysis of network experiments is an example of an area in this vein, which I hope will see a lot of progress over the next 10 years, but I think there are many adjacent opportunities as well.
Photo credit: Rod Searcey