Negation Detection on Mexican Spanish Tweets: The T-MexNeg Corpus
Autor: | Helena Gómez-Adorno, Sergio-Luis Ojeda-Trueba, Alejandro Pimentel, Brian Aguilar-Vizuet, Gemma Bel-Enguix |
---|---|
Rok vydání: | 2021 |
Předmět: |
Conditional random field
Technology QH301-705.5 Computer science QC1-999 Mexican Spanish 02 engineering and technology computer.software_genre Annotation Negation 0202 electrical engineering electronic engineering information engineering General Materials Science Social media Biology (General) QD1-999 Instrumentation Fluid Flow and Transfer Processes Structure (mathematical logic) business.industry Physics Process Chemistry and Technology 05 social sciences General Engineering Engineering (General). Civil engineering (General) language.human_language Computer Science Applications Chemistry Identification (information) machine learning negation language 020201 artificial intelligence & image processing Artificial intelligence TA1-2040 0509 other social sciences 050904 information & library sciences business computer Natural language processing Scope (computer science) |
Zdroj: | Applied Sciences, Vol 11, Iss 3880, p 3880 (2021) Applied Sciences Volume 11 Issue 9 |
ISSN: | 2076-3417 |
Popis: | In this paper, we introduce the T-MexNeg corpus of Tweets written in Mexican Spanish. It consists of 13,704 Tweets, of which 4895 contain negation structures. We performed an analysis of negation statements embedded in the language employed on social media. This research paper aims to present the annotation guidelines along with a novel resource targeted at the negation detection task. The corpus was manually annotated with labels of negation cue, scope, and, event. We report the analysis of the inter-annotator agreement for all the components of the negation structure. This resource is freely available. Furthermore, we performed various experiments to automatically identify negation using the T-MexNeg corpus and the SFU ReviewSP-NEG for training a machine learning algorithm. By comparing two different methodologies, one based on a dictionary and the other based on the Conditional Random Fields algorithm, we found that the results of negation identification on Twitter are lower when the model is trained on the SFU ReviewSP-NEG Corpus. Therefore, this paper shows the importance of having resources built specifically to deal with social media language. |
Databáze: | OpenAIRE |
Externí odkaz: |