A probabilistic clustering model for hate speech classification in twitter
Autor: | Friday Thomas Ibharalu, Idowu Ademola Osinuga, Adebayo Abayomi-Alli, Femi Emmanuel Ayo, Olusegun Folorunso |
---|---|
Rok vydání: | 2021 |
Předmět: |
0209 industrial biotechnology
Voice activity detection Computer science business.industry General Engineering 02 engineering and technology Bayes classifier computer.software_genre Class (biology) Fuzzy logic Cross-validation Computer Science Applications Metadata ComputingMethodologies_PATTERNRECOGNITION 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence tf–idf business Cluster analysis computer ComputingMilieux_MISCELLANEOUS Natural language processing |
Zdroj: | Expert Systems with Applications. 173:114762 |
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2021.114762 |
Popis: | The key challenges for automatic hate-speech classification in Twitter are the lack of generic architecture, imprecision, threshold settings and fragmentation issues. Most studies used binary classifiers for hate speech classification, but these classifiers cannot really capture other emotions that may overlap between positive or negative class. Hence, a probabilistic clustering model for hate speech classification in twitter was developed to tackle problems with hate speech classification. A metadata extractor was used to collect tweets containing hate speech keywords and a crowd-sourced experts was employed to label the collected hate tweets into two categories: hate speech and non-hate speech. Features representation was done with Term Frequency- Inverse Document Frequency (TF-IDF) model and enhanced with topics inferred by a Bayes classifier. A rule-based clustering method was used to automatically classify real-time tweets into the correct topic clusters. Fuzzy logic was then used for hate speech classification using semantic fuzzy rules and a score computation module. From the evaluation results, it was observed that the developed model performed better in hate speech detection with F1-sore of 0.9256 using a 5-fold cross validation. Similarly, the developed model for hate speech classification performed better with F1-score of 91.5 compared to related models. The developed model also indicates a more perfect test having an AUC of 0.9645, when compared to similar methods. The Paired Sample t-Test validated the efficiency of the developed model for hate speech classification. |
Databáze: | OpenAIRE |
Externí odkaz: |