How can we manage Offensive Text in Social Media - A Text Classification Approach using LSTM-BOOST

Autor:	Md. Anwar Hussen Wadud, Muhammad Mohsin Kabir, M.F. Mridha, M. Ameer Ali, Md. Abdul Hamid, Muhammad Mostafa Monowar
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Natural Language Processing Long Short Term Memory Adaptive Boosting Ensemble Learning Abusive Text Information technology T58.5-58.64
Zdroj:	International Journal of Information Management Data Insights, Vol 2, Iss 2, Pp 100095- (2022)
Druh dokumentu:	article
ISSN:	2667-0968
DOI:	10.1016/j.jjimei.2022.100095
Popis:	Recently, offensive content has become increasingly popular for harassing and criticizing people on numerous social media platforms. This paper proposes an offensive text classification algorithm named LSTM-BOOST employing Long Short-Term Memory(LSTM) model with ensemble learning to recognize offensive Bengali texts in various social media platforms. The proposed LSTM-BOOST model uses the modified AdaBoost algorithm employing principal component analysis(PCA) along with LSTM networks. In the LSTM-Boost model, the dataset is divided into three categories, and PCA and LSTM networks are applied to each part of the dataset to obtain the most significant variance and reduce the weighted error of the weak hypothesis of the model. Furthermore, different classifiers are used for baseline experiment and the model is evaluated on various word embedding vector methods. Our investigation found that the LSTM-BOOST algorithms outperform most of the baseline architecture, leading F1-score of 92.61% on the Bengali offensive text from Social Platforms(BHSSP) dataset.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/68b3de89a2fa4c23b463f1efe12caab3 Zobrazit plný text záznamu View record in DOAJ