Natural language processing based identification of Related Short Forum Posts Through Knowledge Based Conceptualization
Autor: | Ajithkumar. A. K, J. C. Miraclin Joyce Pamila, R.Senthamil Selvi |
---|---|
Rok vydání: | 2021 |
Předmět: |
0209 industrial biotechnology
Vocabulary Information retrieval Computer science business.industry media_common.quotation_subject Search engine indexing 02 engineering and technology computer.software_genre Semantics Knowledge-based systems Identification (information) 020901 industrial engineering & automation Named-entity recognition Knowledge base Semantic similarity 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing business computer media_common |
Zdroj: | 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). |
DOI: | 10.1109/icais50930.2021.9396051 |
Popis: | Online communities collaborate and users share their views using online forums. The experience and ideas shared by the users in the forum are rich but finding relevant forum posts is laborious and frustrating. This research is targeted towards comparing a post at hand to find forum posts related to it. The conventional methods for identifying text similarity are not as efficient as they do not conceptualize the short text and lead to poor performance in finding related content. This paper proposes a novel scheme for the identification of related short forum posts in discussion forums. Contrary to the use of fixed vocabulary sets in the existing schemes, the proposed method uses distinct words in the forum post pair to form a joint word set dynamically. The knowledge base is used for deriving a raw semantic vector for each forum post. Further, the two semantic vectors are used for the computation of semantic similarity. The proposed framework uses inverted indexing to improve the efficiency of retrieving relevant forum posts by reducing the search space with synonyms of the forum post at hand. It is proven to be efficient in finding related forum posts in discussion forums with a recall of 90% through a set of tests conducted. It is also observed that precision can be improved with the Named Entity Recognition method. |
Databáze: | OpenAIRE |
Externí odkaz: |