Classification of Insincere Questions Using Deep Learning: Quora Dataset Case Study

Autor:	Qamar Nawaz, Imran Mumtaz, Iram Aslam, M. Azam Zia, Muhammad Hashim
Rok vydání:	2021
Předmět:	Feature engineering Support vector machine Word embedding Artificial neural network business.industry Computer science Deep learning Question answering The Internet Artificial intelligence F1 score business Data science
Zdroj:	Proceedings of the Fifteenth International Conference on Management Science and Engineering Management ISBN: 9783030792022
DOI:	10.1007/978-3-030-79203-9_12
Popis:	In the recent few year internet has gained the attention of researchers from all over the world. Internet has become crucial part of people’s lives rapidly. The major cause of this rapid fame has been highlighted as its ability to make the work simpler and easier. Question answering websites have gained popularity due the human need of getting answers of queries. Such forums may be intruded by adversaries or hackers to ruin the forum’s reputation. The detection of questions from such rivalry is still a key challenge because of two major reasons: firstly, the rate of users may be affected by such kind of questions, and secondly such insincere questions may also be asked to harm some particular user. Due to these reasons this study has been done to purpose deep learning-based solution for insincere question classification. Deep learning has been used due to its marvelous results in text classification related to various fields. Dataset has been collected of Quora website from publicly available source Kaggle. The machine learning methods SVM (Support vecttor Machine), Logistic regression has been compared with deep learning approach LSTM neural network (Long short-term memory). The dataset has been preprocessed and then used to train the models. Models have been trained on datasets by using MATLAB tool. This study has been also done to consider the impact of using various feature engineering techniques with models. Extensive experiments have been performed and parameters of the model have been optimized. F1 score has been used to compare the model performance because the dataset was highly imbalanced. Our existential results have highlighted that LSTM outperforms in terms of F1 score. The findings of this research would be beneficial for such question asking forums.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d92a762a92d7acd379e7041e54a82734 https://doi.org/10.1007/978-3-030-79203-9_12 Zobrazit plný text záznamu