Disambiguating and Specifying Social Actors in Big Data: Using Wikipedia as a Data Source for Demographic Information.

Autor: Poschmann, Philipp, Goldenstein, Jan
Předmět:
Zdroj: Sociological Methods & Research; May2022, Vol. 51 Issue 2, p887-925, 39p
Abstrakt: Despite the recent and ongoing progress in using text-mining tools to automatically analyze large text corpora, there remains significant potential to facilitate the study of social action in social science research. In this context, particularly the disambiguation (who is referred to in a text?) and specification (which demographic characteristics are present?) of social actors—currently a manual job—remains a challenge. This article demonstrates a reliable and accurate software architecture for social scientists who are interested in automatically detecting, disambiguating, and demographically specifying social actors (i.e., persons and organizations) in large text collections. The backbone of our software architecture is the online encyclopedia Wikipedia as a currently unexploited data source of a large amount of accurately prepared information. We illustrate how our software architecture detects and disambiguates social actors in large text corpora and retrieves their respective demographic information. Overall, we evaluate the reliability and accuracy of our software architecture across seven different social settings and facilitate an intuitive sense of the comprehensive applicability of our software architecture. We end by not only highlighting the benefits of our software architecture for social science research but also pointing to the limitations of using Wikipedia as a data source. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index