Networks, culture, innovation and inequality
I am intrigued by how the structure of social networks affects outcomes in different social groups and communities. On the one hand, I am interested in how social network structure and cultural integration affect the capacity of (online) communities to develop shared culture and innovate. On the other hand, I am also intrigued by how numerical properties of different social groups engaging ininteractions with each other can lead to structural inequalities.
The Limit to Gender Equity in Science
Communication With Bas Hofstra, In
preparation. Addressing contemporary societal challenges
crucially depends on the public’s engagement with a broad diversity of
scientific perspectives. Here, we overcome prior data limitations to
provide a holistic account of how scientists are represented in
different media types and how gender, academic prominence, collaboration
networks, and past media exposure influence future media exposure. We
study this with scientometric information of the population of Dutch
full professors and, through natural language processing, follow their
careers into three distinct media types: 721 thousand printed news
articles, 695 thousand mentions in online news articles, and 3.3 million
social media posts. Our findings show that women are significantly
underrepresented in traditional printed media. In contrast, their work
garners an equal amount of attention in online news and social media,
highlighting the democratizing potential of these online platforms to
level the playing field for women and men full professors. Compared to
the printed news, online news attention is also more closely tied to
indicators of academic performance. However, online news rarely mentions
scientists by name and women scientists even less so, limiting the
public’s ability to associate research with the women who produce it.
Our results suggest both reason for optimism and concern. Progress
towards gender equity is made in online news and social media. Yet our
findings also show how women scientists’ voices are restrained in the
more heavily gatekept media types like printed newspapers. “I Heard it on the (Silk)Road”: Structural
and Cultural Pathways to Novelty Introduction in an Online Community
With Damiano Morando, In
preparation. Studies of innovation in linguistics, sociology,
management, and organization science have often noted the tension
between innovative processes consisting of unconventional recombination
of diverse pieces of information and those based on thorough, in-depth
engagement with specialized knowledge. The former has often been
associated with brokers, actors who link otherwise disconnected parts of
the network, and the latter with clustered individuals whose contacts
are also connected to each other. However, research on innovation
regularly fails to distinguish between the cultural content of knowledge
individuals engage with and the way their access to knowledge is
facilitated or constrained by their position in a social network. We
argue that this is why the current findings are mixed and suggest that
the two proposed mechanisms are not competing, but rather complementary
explanations of innovation. We theoretically disentangle an individual’s
structural embeddedness, their position in a network of social actors,
from their cultural embeddedness, their engagement with different types
of knowledge that exist in this network. We exploit network techniques
and advanced text-mining methods (topic modelling) to analyze features
of structural and cultural embeddedness of authors of 223 linguistic
innovations introduced over three years in a large online community
forum. We find that the role of structural embeddedness in supporting
innovation crucially hinges on the type of knowledge an individual
accesses given their structural position in the network of social
actors. Our results, further, show that both the recombination and deep
search pathways lead to innovations, but only if backed by both the
appropriate structural position and the engagement with adequate
knowledge. Embeddings of Nation-Level Social
Networks Addressing contemporary societal challenges
crucially depends on the public’s engagement with a broad diversity of
scientific perspectives. Here, we overcome prior data limitations to
provide a holistic account of how scientists are represented in
different media types and how gender, academic prominence, collaboration
networks, and past media exposure influence future media exposure. We
study this with scientometric information of the population of Dutch
full professors and, through natural language processing, follow their
careers into three distinct media types: 721 thousand printed news
articles, 695 thousand mentions in online news articles, and 3.3 million
social media posts. Our findings show that women are significantly
underrepresented in traditional printed media. In contrast, their work
garners an equal amount of attention in online news and social media,
highlighting the democratizing potential of these online platforms to
level the playing field for women and men full professors. Compared to
the printed news, online news attention is also more closely tied to
indicators of academic performance. However, online news rarely mentions
scientists by name and women scientists even less so, limiting the
public’s ability to associate research with the women who produce it.
Our results suggest both reason for optimism and concern. Progress
towards gender equity is made in online news and social media. Yet our
findings also show how women scientists’ voices are restrained in the
more heavily gatekept media types like printed newspapers. Minority group size moderates
inequity-reducing strategies in homophilic networks
With Sam Zhang, In
preparation. Minorities are often disadvantaged in social
networks, and their disadvantage can arise in the presence of two
ubiquitous features of social networks: homophily and preferential
attachment. This disadvantage often translates to less visibility and
lower access to valuable social capital of minority group members. In
this paper, we use the directed network model with preferential
attachment and homophily (DPAH) to evaluate how minority groups fare in
the presence of additional groups that could bridge the
majority/minority gap, such as majority group members who support the
minority group (allies), and minority group members that are
incorporated into the majority group. Our results show that the marginal
benefit of majority group members becoming allies increases with
minority group size, and larger minority groups reach equity with a
smaller proportion of incorporated members. These results suggest that
interventions on structural inequities on networks can depend
sensitively on the relative sizes of the groups involved. Our models
also reveal the increasing difficulty of a minority group achieving
parity as the group shrinks.
Click for an overview of my projects on this
topic.
Tanzir Pial, Flavio Hafner, Dakota Handzlik, Enamul
Hassan, Lucas Sage, Tom Emery, Arnout van de Rijt, and Steven Skiena.
Published in the proceedings of the International Conference on Complex
Networks and Their Applications. Preprint available here.
Text mining for sociological research
I am interested in better understanding how meaning can be extracted from large amounts of textual data and used in sociological theory building and testing. I explore how automatic text analysis can be used by social scientist and explore the benefits and limitations of working with big data sets and computational methods for text analysis.
Computational Text Analysis for Building and
Testing Social Theory
With Miriam Hurtado Bodell, Marc Keuschnigg, and Anastasia Menshikova . Article published in KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie. Available here.
Digitization and advances in natural language processing have transformed how sociologists can measure, model, and interpret social life through text. We provide an overview of computational text analysis as a methodological tool kit for building and testing social theory. The field is moving from descriptive uses toward theory-driven and causal inference approaches, though methodological standards—especially around data quality, reproducibility, and causal claims—remain inconsistent. Organizing approaches into data-first, theory-first, and theory–data integration paradigms, we highlight how different methods each balance inductive discovery with theoretical specification. We conceptualize text-analytic methods as measurement strategies that extract sociologically relevant information from unstructured language data and show how they can be incorporated into both thick descriptions and causal inference workflows. Taken together, various computational text analysis approaches offer researchers new opportunities to recover latent constructs, bridge quantitative scale with qualitative depth, and revitalize interpretive approaches in sociology.
The dangers of using proprietary LLMs for
research
With Etienne Ollion, Rubing Shen and Arnault Chatelain. Article published in Nature Machine Intelligence. Available here.
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets while providing best-practice recommendations.
Text mining individual states in short
texts
With Wojtek Przepiorka, Forthcoming in Behavior Research Methods. Preprint available here.
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets while providing best-practice recommendations.
Text mining for social science – The state
and the future of computational text analysis in sociology
Article published in Social Science Research. Available here.
The emergence of big data and computational tools has introduced new possibilities for using large-scale textual sources in sociological research. Recent work in sociology of culture, science, and economic sociology has shown how computational text analysis can be used in theory building and testing. This review starts with an introduction of the history of computer-assisted text analysis in sociology and then proceeds to discuss five families of computational methods used in contemporary research. Using exemplary studies, it shows how dictionary methods, semantic and network analysis tools, language models, unsupervised, and supervised machine learning can assist sociologists with different analytical tasks. After presenting recent methodological developments, this review summarizes several important implications of using large datasets and computational methods to infer complex meaning in texts. Finally, it calls researchers from different methodological traditions to adopt text mining tools while remaining mindful of lessons learned from working with conventional data and methods.
ChatGPT for Text Annotation? Mind the
Hype!
With Etienne Ollion, Rubing Shen and Arnault Chatelain, In preparation. Preprint available here.
In the past months, researchers have enthusiastically discussed the relevance of zero- or few-shot classifiers like ChatGPT for text annotation. Should these models prove to be performant, they would open up new continents for research,and beyond. To assess the merits and limits of this approach, we conducted a systematic literature review. Reading all the articles doing zero or few-shot text annotation in the human and social sciences, we found that these few-shot learners offer enticing, yet mixed results on text annotation tasks. The performance scores can vary widely, with some being average and some being very low. Besides, zero or few-shot models are often outperformed by models fine-tuned with human annotations. Our findings thus suggest that, to date, the evidence about their effectiveness remains partial, but also that their use raises several important questions about the reproducibility of results, about privacy and copyright issues, and about the primacy of the English language. While we definitely believe that there are numerous ways to harness this powerful technology productively, we also need to harness it without falling for the hype.
Cooperation and trust in extra-legal contexts
I explore how agents sustain mutually beneficial cooperative relations in adverse contexts. Illegal online marketplaces (cryptomarkets) present a unique context for studying the emergence of cooperation in a context where legal controls are absent, social actors are highly anonymous, and the overall levels of trust are low. With collaborators, I study how socio-technical solutions (such as reputation systems) and informal mechanisms of social control (such as gossip) allow individuals to establish trade in such adverse extra-legal environments.
The Moral Embeddedness of Cryptomarkets:
Text Mining Feedback on Economic Exchanges on the Dark Web.
With Wojtek Przepiorka, Socio-Economic Review,
mwad069. doi: 10.1093/ser/mwad069 In this work, we explore how reputation-based
online markets have shifted the role that psychological mechanisms play
in promoting mutually cooperative market exchange from the stage of
exchange to the stage of sharing information about other traders in the
market. We use text mining to infer reasons traders had for sharing
reputation information about other traders in illegal online
marketplaces and zoom in on the essential tole of moral norms in solving
the second-order cooperation problem in large-scale anonymous internet
marketplaces. Governing Through Gossip: The Role of
Informal Communication in Reputation-Based Online Markets.
With Wojtek Przepiorka and Vincent Buskens,
Under Review.
Click for an overview of my projects on this
topic.
Norm emergence and change
I am interested in how norms emerge to promote or block sucesfull cooperation in societies. In a project with collaborators, I use agent-based models to explore how norms that signal group belonging emerge to solve cooperation problems in contexts where multiple groups with conflicting interests encounter each other. In such contexts, signalling one’s group behaviour can help establish parochial cooperation and benefit exploited minority groups. I am further interested in understanding conditions under which such signalling norms become inefficient and damage the well-being of the groups that enforce them.
Signals of Belonging: Emergence of
Signalling Norms as Facilitators of Trust and Parochial Cooperation
With Milena Tsvetkova, Wojtek Przepiorka, and
Vincent Buskens. Philosophical
Transactions of the Royal Society B 379, 20230029. Available here. Mechanisms of social control reinforce norms that
appear harmful or wasteful, such as mutilation practices or extensive
body tattoos. We suggest such norms arise to serve as signals that
distinguish between ingroup “friends” and outgroup “foes”, facilitating
parochial cooperation. Combining insights from research on signalling
and parochial cooperation, we incorporate a trust game with signalling
in an agent-based model to study the dynamics of signalling norm
emergence in groups with conflicting interests. Our results show that
costly signalling norms emerge from random acts of signalling in
minority groups that benefit most from parochial cooperation. Majority
groups are less likely to develop costly signalling norms. Yet, norms
that prescribe sending costless group identity signals can easily emerge
in groups of all sizes – albeit, at times, at the expense of minority
group members. Further, the dynamics of signalling norm emergence differ
across signal costs, relative group sizes, and levels of ingroup
assortment. Our findings provide theoretical insights into norm
evolution in contexts where groups develop identity markers in response
to environmental challenges that put their interests at odds with the
interests of other groups. Such contexts arise in zones of ethnic
conflict or during contestations of existing power relations.
Click for an overview of my project on this
topic.