A probabilistic clustering model for hate speech classification in twitter

Autor:	Friday Thomas Ibharalu, Idowu Ademola Osinuga, Adebayo Abayomi-Alli, Femi Emmanuel Ayo, Olusegun Folorunso
Rok vydání:	2021
Předmět:	0209 industrial biotechnology Voice activity detection Computer science business.industry General Engineering 02 engineering and technology Bayes classifier computer.software_genre Class (biology) Fuzzy logic Cross-validation Computer Science Applications Metadata ComputingMethodologies_PATTERNRECOGNITION 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence tf–idf business Cluster analysis computer ComputingMilieux_MISCELLANEOUS Natural language processing
Zdroj:	Expert Systems with Applications. 173:114762
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2021.114762
Popis:	The key challenges for automatic hate-speech classification in Twitter are the lack of generic architecture, imprecision, threshold settings and fragmentation issues. Most studies used binary classifiers for hate speech classification, but these classifiers cannot really capture other emotions that may overlap between positive or negative class. Hence, a probabilistic clustering model for hate speech classification in twitter was developed to tackle problems with hate speech classification. A metadata extractor was used to collect tweets containing hate speech keywords and a crowd-sourced experts was employed to label the collected hate tweets into two categories: hate speech and non-hate speech. Features representation was done with Term Frequency- Inverse Document Frequency (TF-IDF) model and enhanced with topics inferred by a Bayes classifier. A rule-based clustering method was used to automatically classify real-time tweets into the correct topic clusters. Fuzzy logic was then used for hate speech classification using semantic fuzzy rules and a score computation module. From the evaluation results, it was observed that the developed model performed better in hate speech detection with F1-sore of 0.9256 using a 5-fold cross validation. Similarly, the developed model for hate speech classification performed better with F1-score of 91.5 compared to related models. The developed model also indicates a more perfect test having an AUC of 0.9645, when compared to similar methods. The Paired Sample t-Test validated the efficiency of the developed model for hate speech classification.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::7ae6deacd24a53d1d8f05520492cd2d6 https://doi.org/10.1016/j.eswa.2021.114762 Zobrazit plný text záznamu Full Text from ScienceDirect