Exploring Deep Learning in Semantic Question Matching

Autor:	Sagar Gaire, Ashwin Dhakal, Arpan Poudel, Sagar Pandey, Hari Prasad Baral
Rok vydání:	2018
Předmět:	Matching (statistics) Stop words Artificial neural network Computer science business.industry Deep learning Lemmatisation Feature extraction computer.software_genre Semantics Tokenization (data security) Artificial intelligence business computer Natural language processing
Zdroj:	2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS).
Popis:	Question duplication is the major problem encountered by Q &A forums like Quora, Stack-overflow, Reddit, etc. Answers get fragmented across different versions of the same question due to the redundancy of questions in these forums. Eventually, this results in lack of a sensible search, answer fatigue, segregation of information and the paucity of response to the questioners. The duplicate questions can be detected using Machine Learning and Natural Language Processing. Dataset of more than 400,000 questions pairs provided by Quora are preprocessed through tokenization, lemmatization and removal of stop words. This pre-processed dataset is used for the feature extraction. Artificial Neural Network is then designed and the features hence extracted, are fit into the model. This neural network gives accuracy of 86.09%. In a nutshell, this research predicts the semantic coincidence between the question pairs extracting highly dominant features and hence, determine the probability of question being duplicate.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9ba4ca117dc2f21b1a9e0bd10dd1832d https://doi.org/10.1109/cccs.2018.8586832 Zobrazit plný text záznamu