Networks, culture, innovation and inequality

I am intrigued by how the structure of social networks affects outcomes in different social groups and communities. On the one hand, I am interested in how social network structure and cultural integration affect the capacity of (online) communities to develop shared culture and innovate. On the other hand, I am also intrigued by how numerical properties of different social groups engaging ininteractions with each other can lead to structural inequalities.


Click for an overview of my projects on this topic.


Gendered Accumulation of Fame: Effects of Cumulative Advantage and Network Spillovers on Representation of Dutch Professors in Media. In preparation

With Bas Hofstra, In preparation.

Engagement of scientists with the general public is becoming increasingly important. Experts in the media spotlight yield considerable power in setting the public agenda. Yet, time in the spotlight is unequally distributed between women and men in science, most likely because of unequal access to social network resources that facilitate access to the media domain. We first set out to understand whether gender inequalities in media representation are exacerbated by cumulative advantage where reputable scientists accrue further recognition, while the unrecognized ones increasingly lag behind. Second, we suggest the presence of so-called social network “second-order” cumulative advantage in media representation; the media popularity of scientists’ social ties (i.e., coauthors, departmental colleagues, and so forth) flows over to increasingly benefit the focal scholar. Men and women likely differ in their social network affiliations with media-savvy colleagues. Further, the advantages of having similarly popular ties might differ between men and women, further exacerbating the gender/media representation gap.


“I Heard it on the (Silk)Road”: Structural and Cultural Pathways to Novelty Introduction in an Online Community

With Damiano Morando, In preparation.

Online communities play a crucial role in generating and spreading innovation which affects both the online and offline world. Yet, we still know relatively little about the members who introduce novelty into the community - innovators. We explore the structural and cultural embeddedness of members who introduce linguistic and knowledge-based innovations into a large online forum. First, we find that the effect of being structurally embedded in the community is moderated by the content one is exposed to. That is, less embedded individuals (i.e., brokers) benefit from access to diverse content in the online community;while embedded individuals benefit from seeing similar content.Second, we show that being culturally embedded in the community has an inverse U shape effect over the likelihood introducing an innovation. We find evidence for the tension between the norm-following required for meaningful innovations and the norm-breaking that pushes one to “think outside of the box”.


Minority group size moderates inequity-reducing strategies in homophilic networks

With Sam Zhang, In preparation.

Minorities are often disadvantaged in social networks, and their disadvantage can arise in the presence of two ubiquitous features of social networks: homophily and preferential attachment. This disadvantage often translates to less visibility and lower access to valuable social capital of minority group members. In this paper, we use the directed network model with preferential attachment and homophily (DPAH) to evaluate how minority groups fare in the presence of additional groups that could bridge the majority/minority gap, such as majority group members who support the minority group (allies), and minority group members that are incorporated into the majority group. Our results show that the marginal benefit of majority group members becoming allies increases with minority group size, and larger minority groups reach equity with a smaller proportion of incorporated members. These results suggest that interventions on structural inequities on networks can depend sensitively on the relative sizes of the groups involved. Our models also reveal the increasing difficulty of a minority group achieving parity as the group shrinks.




Text mining for sociological research

I am interested in better understanding how meaning can be extracted from large amounts of textual data and used in sociological theory building and testing. I explore how automatic text analysis can be used by social scientist and explore the benefits and limitations of working with big data sets and computational methods for text analysis.

Click for an overview of my projects on this topic.


The dangers of using proprietary LLMs for research

With Etienne Ollion, Rubing Shen and Arnault Chatelain. Article published in Nature Machine Intelligence. Available here

Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets while providing best-practice recommendations.


Text mining individual states in short texts

With Wojtek Przepiorka, Forthcoming in Behavior Research Methods. Preprint available here.

Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets while providing best-practice recommendations.


Text mining for social science – The state and the future of computational text analysis in sociology

Article published in Social Science Research. Available here.

The emergence of big data and computational tools has introduced new possibilities for using large-scale textual sources in sociological research. Recent work in sociology of culture, science, and economic sociology has shown how computational text analysis can be used in theory building and testing. This review starts with an introduction of the history of computer-assisted text analysis in sociology and then proceeds to discuss five families of computational methods used in contemporary research. Using exemplary studies, it shows how dictionary methods, semantic and network analysis tools, language models, unsupervised, and supervised machine learning can assist sociologists with different analytical tasks. After presenting recent methodological developments, this review summarizes several important implications of using large datasets and computational methods to infer complex meaning in texts. Finally, it calls researchers from different methodological traditions to adopt text mining tools while remaining mindful of lessons learned from working with conventional data and methods.


ChatGPT for Text Annotation? Mind the Hype!

With Etienne Ollion, Rubing Shen and Arnault Chatelain, In preparation. Preprint available here.

In the past months, researchers have enthusiastically discussed the relevance of zero- or few-shot classifiers like ChatGPT for text annotation. Should these models prove to be performant, they would open up new continents for research,and beyond. To assess the merits and limits of this approach, we conducted a systematic literature review. Reading all the articles doing zero or few-shot text annotation in the human and social sciences, we found that these few-shot learners offer enticing, yet mixed results on text annotation tasks. The performance scores can vary widely, with some being average and some being very low. Besides, zero or few-shot models are often outperformed by models fine-tuned with human annotations. Our findings thus suggest that, to date, the evidence about their effectiveness remains partial, but also that their use raises several important questions about the reproducibility of results, about privacy and copyright issues, and about the primacy of the English language. While we definitely believe that there are numerous ways to harness this powerful technology productively, we also need to harness it without falling for the hype.




Cooperation and trust in extra-legal contexts

I explore how agents sustain mutually beneficial cooperative relations in adverse contexts. Illegal online marketplaces (cryptomarkets) present a unique context for studying the emergence of cooperation in a context where legal controls are absent, social actors are highly anonymous, and the overall levels of trust are low. With collaborators, I study how socio-technical solutions (such as reputation systems) and informal mechanisms of social control (such as gossip) allow individuals to establish trade in such adverse extra-legal environments.


Click for an overview of my projects on this topic.


The Moral Embeddedness of Cryptomarkets: Text Mining Feedback on Economic Exchanges on the Dark Web.

With Wojtek Przepiorka, Socio-Economic Review, mwad069. doi: 10.1093/ser/mwad069

In this work, we explore how reputation-based online markets have shifted the role that psychological mechanisms play in promoting mutually cooperative market exchange from the stage of exchange to the stage of sharing information about other traders in the market. We use text mining to infer reasons traders had for sharing reputation information about other traders in illegal online marketplaces and zoom in on the essential tole of moral norms in solving the second-order cooperation problem in large-scale anonymous internet marketplaces.


Governing Through Gossip: The Role of Informal Communication in Reputation-Based Online Markets.

With Wojtek Przepiorka and Vincent Buskens, Under Review.

Sharing of reputational information in large-scale online markets has been mainly delegated to socio-technical solutions such as reputation systems. Reputation systems have been shown to incentivize trustworthy trader behavior by collecting and transmitting information about trader reputations. However, beyond relying on quantitative scores in reputation systems, traders in online markets make use of informal community spaces for interactive, gossip-like production of information about trader reputations. We show how reputation information shared in informal communities helps govern online markets for illegal goods through two mechanisms. We combine manual text coding and deep learning language models to complement data on hundreds of thousands of market transactions with sentiments extracted from 1.6 million community texts written about traders in two illegal online markets. First, we find that reputational information shared in informal communities complements formalized reputation systems, thereby supporting informal social control. Second, we observe that community-generated information helps platform administrators to exhibit formal control by identifying and excluding untrustworthy sellers from the marketplace. We show how, beyond reliance on sophisticated reputation systems that quantify reputations, traders still rely on informal, gossip-like discussions when governing cooperative exchanges in online markets. Our work sheds light on the ecosystem of bottom-up and top-down mechanisms of social control in large-scale anonymous market environments.




Norm emergence and change

I am interested in how norms emerge to promote or block sucesfull cooperation in societies. In a project with collaborators, I use agent-based models to explore how norms that signal group belonging emerge to solve cooperation problems in contexts where multiple groups with conflicting interests encounter each other. In such contexts, signalling one’s group behaviour can help establish parochial cooperation and benefit exploited minority groups. I am further interested in understanding conditions under which such signalling norms become inefficient and damage the well-being of the groups that enforce them.


Click for an overview of my project on this topic.


Signals of Belonging: Emergence of Signalling Norms as Facilitators of Trust and Parochial Cooperation

With Milena Tsvetkova, Wojtek Przepiorka, and Vincent Buskens.
Philosophical Transactions of the Royal Society B 379, 20230029. Available here.

Mechanisms of social control reinforce norms that appear harmful or wasteful, such as mutilation practices or extensive body tattoos. We suggest such norms arise to serve as signals that distinguish between ingroup “friends” and outgroup “foes”, facilitating parochial cooperation. Combining insights from research on signalling and parochial cooperation, we incorporate a trust game with signalling in an agent-based model to study the dynamics of signalling norm emergence in groups with conflicting interests. Our results show that costly signalling norms emerge from random acts of signalling in minority groups that benefit most from parochial cooperation. Majority groups are less likely to develop costly signalling norms. Yet, norms that prescribe sending costless group identity signals can easily emerge in groups of all sizes – albeit, at times, at the expense of minority group members. Further, the dynamics of signalling norm emergence differ across signal costs, relative group sizes, and levels of ingroup assortment. Our findings provide theoretical insights into norm evolution in contexts where groups develop identity markers in response to environmental challenges that put their interests at odds with the interests of other groups. Such contexts arise in zones of ethnic conflict or during contestations of existing power relations.