Networks, culture, innovation and inequality
I am intrigued by how the structure of social networks affects outcomes in different social groups and communities. On the one hand, I am interested in how social network structure and cultural integration affect the capacity of (online) communities to develop shared culture and innovate. On the other hand, I am also intrigued by how numerical properties of different social groups engaging ininteractions with each other can lead to structural inequalities.
Gendered Accumulation of Fame: Effects of
Cumulative Advantage and Network Spillovers on Representation of Dutch
Professors in Media. In preparation With Bas Hofstra, In
preparation. Engagement of scientists with the general public
is becoming increasingly important. Experts in the media spotlight yield
considerable power in setting the public agenda. Yet, time in the
spotlight is unequally distributed between women and men in science,
most likely because of unequal access to social network resources that
facilitate access to the media domain. We first set out to understand
whether gender inequalities in media representation are exacerbated by
cumulative advantage where reputable scientists accrue further
recognition, while the unrecognized ones increasingly lag behind.
Second, we suggest the presence of so-called social network
“second-order” cumulative advantage in media representation; the media
popularity of scientists’ social ties (i.e., coauthors, departmental
colleagues, and so forth) flows over to increasingly benefit the focal
scholar. Men and women likely differ in their social network
affiliations with media-savvy colleagues. Further, the advantages of
having similarly popular ties might differ between men and women,
further exacerbating the gender/media representation gap. “I Heard it on the (Silk)Road”: Structural
and Cultural Pathways to Novelty Introduction in an Online Community
With Damiano Morando, In
preparation. Online communities play a crucial role in
generating and spreading innovation which affects both the online and
offline world. Yet, we still know relatively little about the members
who introduce novelty into the community - innovators. We explore the
structural and cultural embeddedness of members who introduce linguistic
and knowledge-based innovations into a large online forum. First, we
find that the effect of being structurally embedded in the community is
moderated by the content one is exposed to. That is, less embedded
individuals (i.e., brokers) benefit from access to diverse content in
the online community;while embedded individuals benefit from seeing
similar content.Second, we show that being culturally embedded in the
community has an inverse U shape effect over the likelihood introducing
an innovation. We find evidence for the tension between the
norm-following required for meaningful innovations and the norm-breaking
that pushes one to “think outside of the box”. Minority group size moderates
inequity-reducing strategies in homophilic networks
With Sam Zhang, In
preparation. Minorities are often disadvantaged in social
networks, and their disadvantage can arise in the presence of two
ubiquitous features of social networks: homophily and preferential
attachment. This disadvantage often translates to less visibility and
lower access to valuable social capital of minority group members. In
this paper, we use the directed network model with preferential
attachment and homophily (DPAH) to evaluate how minority groups fare in
the presence of additional groups that could bridge the
majority/minority gap, such as majority group members who support the
minority group (allies), and minority group members that are
incorporated into the majority group. Our results show that the marginal
benefit of majority group members becoming allies increases with
minority group size, and larger minority groups reach equity with a
smaller proportion of incorporated members. These results suggest that
interventions on structural inequities on networks can depend
sensitively on the relative sizes of the groups involved. Our models
also reveal the increasing difficulty of a minority group achieving
parity as the group shrinks.
Click for an overview of my projects on this
topic.
Text mining for sociological research
I am interested in better understanding how meaning can be extracted from large amounts of textual data and used in sociological theory building and testing. I explore how automatic text analysis can be used by social scientist and explore the benefits and limitations of working with big data sets and computational methods for text analysis.
The dangers of using proprietary LLMs for
research
With Etienne Ollion, Rubing Shen and Arnault Chatelain. Article published in Nature Machine Intelligence. Available here
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets while providing best-practice recommendations.
Text mining individual states in short
texts
With Wojtek Przepiorka, Forthcoming in Behavior Research Methods. Preprint available here.
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets while providing best-practice recommendations.
Text mining for social science – The state
and the future of computational text analysis in sociology
Article published in Social Science Research. Available here.
The emergence of big data and computational tools has introduced new possibilities for using large-scale textual sources in sociological research. Recent work in sociology of culture, science, and economic sociology has shown how computational text analysis can be used in theory building and testing. This review starts with an introduction of the history of computer-assisted text analysis in sociology and then proceeds to discuss five families of computational methods used in contemporary research. Using exemplary studies, it shows how dictionary methods, semantic and network analysis tools, language models, unsupervised, and supervised machine learning can assist sociologists with different analytical tasks. After presenting recent methodological developments, this review summarizes several important implications of using large datasets and computational methods to infer complex meaning in texts. Finally, it calls researchers from different methodological traditions to adopt text mining tools while remaining mindful of lessons learned from working with conventional data and methods.
ChatGPT for Text Annotation? Mind the
Hype!
With Etienne Ollion, Rubing Shen and Arnault Chatelain, In preparation. Preprint available here.
In the past months, researchers have enthusiastically discussed the relevance of zero- or few-shot classifiers like ChatGPT for text annotation. Should these models prove to be performant, they would open up new continents for research,and beyond. To assess the merits and limits of this approach, we conducted a systematic literature review. Reading all the articles doing zero or few-shot text annotation in the human and social sciences, we found that these few-shot learners offer enticing, yet mixed results on text annotation tasks. The performance scores can vary widely, with some being average and some being very low. Besides, zero or few-shot models are often outperformed by models fine-tuned with human annotations. Our findings thus suggest that, to date, the evidence about their effectiveness remains partial, but also that their use raises several important questions about the reproducibility of results, about privacy and copyright issues, and about the primacy of the English language. While we definitely believe that there are numerous ways to harness this powerful technology productively, we also need to harness it without falling for the hype.
Cooperation and trust in extra-legal contexts
I explore how agents sustain mutually beneficial cooperative relations in adverse contexts. Illegal online marketplaces (cryptomarkets) present a unique context for studying the emergence of cooperation in a context where legal controls are absent, social actors are highly anonymous, and the overall levels of trust are low. With collaborators, I study how socio-technical solutions (such as reputation systems) and informal mechanisms of social control (such as gossip) allow individuals to establish trade in such adverse extra-legal environments.
The Moral Embeddedness of Cryptomarkets:
Text Mining Feedback on Economic Exchanges on the Dark Web.
With Wojtek Przepiorka, Socio-Economic Review,
mwad069. doi: 10.1093/ser/mwad069 In this work, we explore how reputation-based
online markets have shifted the role that psychological mechanisms play
in promoting mutually cooperative market exchange from the stage of
exchange to the stage of sharing information about other traders in the
market. We use text mining to infer reasons traders had for sharing
reputation information about other traders in illegal online
marketplaces and zoom in on the essential tole of moral norms in solving
the second-order cooperation problem in large-scale anonymous internet
marketplaces. Governing Through Gossip: The Role of
Informal Communication in Reputation-Based Online Markets.
With Wojtek Przepiorka and Vincent Buskens,
Under Review.
Click for an overview of my projects on this
topic.
Norm emergence and change
I am interested in how norms emerge to promote or block sucesfull cooperation in societies. In a project with collaborators, I use agent-based models to explore how norms that signal group belonging emerge to solve cooperation problems in contexts where multiple groups with conflicting interests encounter each other. In such contexts, signalling one’s group behaviour can help establish parochial cooperation and benefit exploited minority groups. I am further interested in understanding conditions under which such signalling norms become inefficient and damage the well-being of the groups that enforce them.
Signals of Belonging: Emergence of
Signalling Norms as Facilitators of Trust and Parochial Cooperation
With Milena Tsvetkova, Wojtek Przepiorka, and
Vincent Buskens. Philosophical
Transactions of the Royal Society B 379, 20230029. Available here. Mechanisms of social control reinforce norms that
appear harmful or wasteful, such as mutilation practices or extensive
body tattoos. We suggest such norms arise to serve as signals that
distinguish between ingroup “friends” and outgroup “foes”, facilitating
parochial cooperation. Combining insights from research on signalling
and parochial cooperation, we incorporate a trust game with signalling
in an agent-based model to study the dynamics of signalling norm
emergence in groups with conflicting interests. Our results show that
costly signalling norms emerge from random acts of signalling in
minority groups that benefit most from parochial cooperation. Majority
groups are less likely to develop costly signalling norms. Yet, norms
that prescribe sending costless group identity signals can easily emerge
in groups of all sizes – albeit, at times, at the expense of minority
group members. Further, the dynamics of signalling norm emergence differ
across signal costs, relative group sizes, and levels of ingroup
assortment. Our findings provide theoretical insights into norm
evolution in contexts where groups develop identity markers in response
to environmental challenges that put their interests at odds with the
interests of other groups. Such contexts arise in zones of ethnic
conflict or during contestations of existing power relations.
Click for an overview of my project on this
topic.