Piloting a theory-based approach to inferring gender in big data
Autor: | Jason Radford |
---|---|
Rok vydání: | 2017 |
Předmět: |
060201 languages & linguistics
Sociological theory business.industry Computer science Big data Inference 06 humanities and the arts computer.software_genre Professional writing Biology and political orientation Race (biology) Scripting language 0602 languages and literature Artificial intelligence business computer Natural language processing |
Zdroj: | IEEE BigData |
DOI: | 10.1109/bigdata.2017.8258555 |
Popis: | Machine learning methods can be used to accurately predict core characteristics about people such as their gender, age, race, or political orientation. However, prediction models tend not to generalize, offer little explanation for particular corpora, produce weak theory, and suffer from latent biases. In this study, we present an alternative approach to demographic inference combining sociological theories of gender with machine learning to create high-dimensional measures of gender rather than predict sex. We create measurement models for gender across five corpora: blog posts, tweets, crowdfunding essays, movie scripts, and professional writing. We show these models validly measure gender in the corpora and then compare their ability to predict author gender to standard prediction models. We find that measurement models of gender are as accurate and sometimes more accurate than prediction models. Thus we show theory-based measurement models are not only interpretable but performant. |
Databáze: | OpenAIRE |
Externí odkaz: |