Label Propagation-Based Semi-Supervised Learning for Hate Speech Classification

Autor:	Dietrich Klakow, Irina Illina, Dana Ruiter, Ashwin Geet D'Sa, Dominique Fohr
Přispěvatelé:	Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Saarland University [Saarbrücken], GRID5000
Rok vydání:	2020
Předmět:	ComputingMethodologies_PATTERNRECOGNITION [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] Computer science Speech recognition 0202 electrical engineering electronic engineering information engineering Labeled data [INFO]Computer Science [cs] 020206 networking & telecommunications 02 engineering and technology Semi-supervised learning Speech classification Classifier (UML) Label propagation
Zdroj:	Insights Insights from Negative Results Workshop, EMNLP 2020 Insights from Negative Results Workshop, EMNLP 2020, Nov 2020, Punta Cana, Dominican Republic
DOI:	10.18653/v1/2020.insights-1.8
Popis:	International audience; Research on hate speech classification has received increased attention. In real-life scenarios , a small amount of labeled hate speech data is available to train a reliable classifier. Semi-supervised learning takes advantage of a small amount of labeled data and a large amount of unlabeled data. In this paper, label propagation-based semi-supervised learning is explored for the task of hate speech classification. The quality of labeling the unla-beled set depends on the input representations. In this work, we show that pre-trained representations are label agnostic, and when used with label propagation yield poor results. Neu-ral network-based fine-tuning can be adopted to learn task-specific representations using a small amount of labeled data. We show that fully fine-tuned representations may not always be the best representations for the label propagation and intermediate representations may perform better in a semi-supervised setup.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d1e8d5039978be5668b46c98defb2b73 https://doi.org/10.18653/v1/2020.insights-1.8 Zobrazit plný text záznamu