Twitter Universal Dependency Parsing for African-American and Mainstream American English
Autor: | Johnny Tian-Zheng Wei, Brendan O'Connor, Su Lin Blodgett |
---|---|
Rok vydání: | 2018 |
Předmět: |
060201 languages & linguistics
African american Parsing business.industry Computer science American English 06 humanities and the arts 02 engineering and technology computer.software_genre Variety (linguistics) Syntax Dependency grammar 0602 languages and literature 0202 electrical engineering electronic engineering information engineering Mainstream 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing |
Zdroj: | ACL (1) |
DOI: | 10.18653/v1/p18-1131 |
Popis: | Due to the presence of both Twitter-specific conventions and non-standard and dialectal language, Twitter presents a significant parsing challenge to current dependency parsing tools. We broaden English dependency parsing to handle social media English, particularly social media African-American English (AAE), by developing and annotating a new dataset of 500 tweets, 250 of which are in AAE, within the Universal Dependencies 2.0 framework. We describe our standards for handling Twitter- and AAE-specific features and evaluate a variety of cross-domain strategies for improving parsing with no, or very little, in-domain labeled data, including a new data synthesis approach. We analyze these methods’ impact on performance disparities between AAE and Mainstream American English tweets, and assess parsing accuracy for specific AAE lexical and syntactic features. Our annotated data and a parsing model are available at: http://slanglab.cs.umass.edu/TwitterAAE/. |
Databáze: | OpenAIRE |
Externí odkaz: |