Autor: |
Md. Anwar Hussen Wadud, Muhammad Mohsin Kabir, M.F. Mridha, M. Ameer Ali, Md. Abdul Hamid, Muhammad Mostafa Monowar |
Jazyk: |
angličtina |
Rok vydání: |
2022 |
Předmět: |
|
Zdroj: |
International Journal of Information Management Data Insights, Vol 2, Iss 2, Pp 100095- (2022) |
Druh dokumentu: |
article |
ISSN: |
2667-0968 |
DOI: |
10.1016/j.jjimei.2022.100095 |
Popis: |
Recently, offensive content has become increasingly popular for harassing and criticizing people on numerous social media platforms. This paper proposes an offensive text classification algorithm named LSTM-BOOST employing Long Short-Term Memory(LSTM) model with ensemble learning to recognize offensive Bengali texts in various social media platforms. The proposed LSTM-BOOST model uses the modified AdaBoost algorithm employing principal component analysis(PCA) along with LSTM networks. In the LSTM-Boost model, the dataset is divided into three categories, and PCA and LSTM networks are applied to each part of the dataset to obtain the most significant variance and reduce the weighted error of the weak hypothesis of the model. Furthermore, different classifiers are used for baseline experiment and the model is evaluated on various word embedding vector methods. Our investigation found that the LSTM-BOOST algorithms outperform most of the baseline architecture, leading F1-score of 92.61% on the Bengali offensive text from Social Platforms(BHSSP) dataset. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|