ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model for offensive language detection

Autor:	Juan Manuel Pérez, Aymé Arango, Franco M. Luque
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer Science - Computation and Language Language identification Computer science business.industry Arabic Turkish Offensive computer.software_genre language.human_language Code (semiotics) SemEval Task (project management) Danish language Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing
Zdroj:	SemEval@COLING
DOI:	10.48550/arxiv.2008.06408
Popis:	This paper describes our participation in SemEval-2020 Task 12: Multilingual Offensive Language Detection. We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages: English, Danish, Turkish, Greek and Arabic. Our single model had competitive results, with a performance close to top-performing systems in spite of sharing the same parameters across all languages. Zero-shot and few-shot experiments were also conducted to analyze the transference performance among these languages. We make our code public for further research Comment: Github repo: https://github.com/finiteautomata/offenseval2020
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1ae4ac122da8b8b48ed330bdc15ee44a Zobrazit plný text záznamu