Piloting a theory-based approach to inferring gender in big data

Autor: Jason Radford
Rok vydání: 2017
Předmět:
Zdroj: IEEE BigData
DOI: 10.1109/bigdata.2017.8258555
Popis: Machine learning methods can be used to accurately predict core characteristics about people such as their gender, age, race, or political orientation. However, prediction models tend not to generalize, offer little explanation for particular corpora, produce weak theory, and suffer from latent biases. In this study, we present an alternative approach to demographic inference combining sociological theories of gender with machine learning to create high-dimensional measures of gender rather than predict sex. We create measurement models for gender across five corpora: blog posts, tweets, crowdfunding essays, movie scripts, and professional writing. We show these models validly measure gender in the corpora and then compare their ability to predict author gender to standard prediction models. We find that measurement models of gender are as accurate and sometimes more accurate than prediction models. Thus we show theory-based measurement models are not only interpretable but performant.
Databáze: OpenAIRE