Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns
Autor: | Jason Baldridge, Kellie Webster, Vera Axelrod, Marta Recasens |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Linguistics and Language Coreference Computer Science - Computation and Language Computer science business.industry Communication Natural language understanding 02 engineering and technology 010501 environmental sciences Resolution (logic) computer.software_genre 01 natural sciences Computer Science Applications Task (project management) Human-Computer Interaction Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Computation and Language (cs.CL) computer Natural language processing 0105 earth and related environmental sciences |
Zdroj: | Transactions of the Association for Computational Linguistics. 6:605-617 |
ISSN: | 2307-387X |
DOI: | 10.1162/tacl_a_00240 |
Popis: | Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines that demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge. |
Databáze: | OpenAIRE |
Externí odkaz: |