SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

Autor:	Piotr Andruszkiewicz, Paweł Bujnowski, Zuzanna Bordzicka, Klaudia Firląg, Joanna Kolis, Michał Satława, Jaroslaw Piersa, Katarzyna Beksa, Katarzyna Zamłyńska, Christian Goltz
Rok vydání:	2021
Předmět:	Training set business.industry Computer science Value (computer science) computer.software_genre SemEval Task (project management) Language model Artificial intelligence business computer Natural language processing Word (computer architecture) Binary word
Zdroj:	SemEval@ACL/IJCNLP
Popis:	This paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language models. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::772056e676e4c32837741e8bdb0b071f https://doi.org/10.18653/v1/2021.semeval-1.133 Zobrazit plný text záznamu