Semi-Automatic Dataset Annotation Applied to Automatic Violent Message Detection

Autor:	Beatriz Botella-Gil, Robiert Sepulveda-Torres, Alba Bonet-Jover, Patricio Martinez-Barco, Estela Saquete
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Natural language processing violent language hate speech detection assisted annotation dataset construction human-in-the-loop Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 19651-19664 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3361404
Popis:	Annotated corpora are indispensable tools to train computational models in Artificial Intelligence and Natural Language Processing. However, manual annotation is a costly, arduous, and time-consuming task, especially when the annotation is semantically complex. To address the problem, this work applies a methodology for semi-automatic annotation of datasets based on the Human-in-the-Loop paradigm. The methodology supports the building of a resource, that benefits from a fine-grained annotation, to aid in the detection of Spanish violent messages sourced from social media (Twitter/X). After implementing the proposed methodology for semi-automatic violence annotation, a high quality resource was obtained (hereafter referred to as VILLANOS). The methodology consists of annotating the dataset incrementally, which delivers an increase in annotator efficiency, thereby validating the suitability of the proposal. Annotation time was reduced by 52% compared to manual annotation and performance, by training a model with the VILLANOS dataset, obtains an $F_{1}$ of 85.2%. These results demonstrate the efficiency and effectiveness of the methodology, evidencing its validity.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/2cbd2265da7a440cbc1f6d40a4a7e136 Zobrazit plný text záznamu View record in DOAJ