Offensive Language and Hate Speech Detection using BERT Model

Autor: Fadila Shely Amalia, Yohanes Suyanto
Jazyk: English<br />Indonesian
Rok vydání: 2024
Předmět:
Zdroj: IJCCS (Indonesian Journal of Computing and Cybernetics Systems), Vol 18, Iss 4 (2024)
Druh dokumentu: article
ISSN: 1978-1520
2460-7258
DOI: 10.22146/ijccs.99841
Popis: Hate speech detection is an important issue in sentiment analysis and natural language processing. This study aims to improve the effectiveness of hate speech detection in English text using the BERT model, along with modified preprocessing techniques to enhance the F1-score. The dataset, sourced from Kaggle, contains English text with hate speech content. Evaluation results show a significant improvement in the model's accuracy and overall text classification performance. The BERT model achieved 89.11% accuracy on test data, correctly predicting 85 out of 95 samples. While the model excels at classifying offensive text with around 95% accuracy, it struggles to distinguish between hate and offensive text, with some confusion between neither and offensive categories. The classification report shows F1-scores of 0.43 for the hate class, 0.94 for the offensive class, and 0.84 for the neither class, with a weighted average F1-score of 0.89 and a macro average of 0.73. These results indicate that the BERT model delivers solid performance in detecting hate speech, though there is room for improvement, particularly in distinguishing certain classes.
Databáze: Directory of Open Access Journals