Negation Detection on Mexican Spanish Tweets: The T-MexNeg Corpus

Autor: Helena Gómez-Adorno, Sergio-Luis Ojeda-Trueba, Alejandro Pimentel, Brian Aguilar-Vizuet, Gemma Bel-Enguix
Rok vydání: 2021
Předmět:
Conditional random field
Technology
QH301-705.5
Computer science
QC1-999
Twitter
Mexican Spanish
02 engineering and technology
computer.software_genre
Annotation
Negation
0202 electrical engineering
electronic engineering
information engineering

General Materials Science
Social media
Biology (General)
QD1-999
Instrumentation
Fluid Flow and Transfer Processes
Structure (mathematical logic)
business.industry
Physics
Process Chemistry and Technology
05 social sciences
General Engineering
Engineering (General). Civil engineering (General)
language.human_language
Computer Science Applications
Chemistry
Identification (information)
machine learning
negation
language
020201 artificial intelligence & image processing
Artificial intelligence
TA1-2040
0509 other social sciences
050904 information & library sciences
business
computer
Natural language processing
Scope (computer science)
Zdroj: Applied Sciences, Vol 11, Iss 3880, p 3880 (2021)
Applied Sciences
Volume 11
Issue 9
ISSN: 2076-3417
Popis: In this paper, we introduce the T-MexNeg corpus of Tweets written in Mexican Spanish. It consists of 13,704 Tweets, of which 4895 contain negation structures. We performed an analysis of negation statements embedded in the language employed on social media. This research paper aims to present the annotation guidelines along with a novel resource targeted at the negation detection task. The corpus was manually annotated with labels of negation cue, scope, and, event. We report the analysis of the inter-annotator agreement for all the components of the negation structure. This resource is freely available. Furthermore, we performed various experiments to automatically identify negation using the T-MexNeg corpus and the SFU ReviewSP-NEG for training a machine learning algorithm. By comparing two different methodologies, one based on a dictionary and the other based on the Conditional Random Fields algorithm, we found that the results of negation identification on Twitter are lower when the model is trained on the SFU ReviewSP-NEG Corpus. Therefore, this paper shows the importance of having resources built specifically to deal with social media language.
Databáze: OpenAIRE