A modified Vector Space Model for semantic information retrieval

Autor: Callistus Ireneous Nakpih
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Natural Language Processing Journal, Vol 8, Iss , Pp 100081- (2024)
Druh dokumentu: article
ISSN: 2949-7191
DOI: 10.1016/j.nlp.2024.100081
Popis: In this research, we present a modified Vector Space Model which focuses on the semantic relevance of words for retrieving documents. The modified VSM resolves the problem of the classical model performing only lexical matching of query terms to document terms for retrievals. This problem also restricts the classical model from retrieving documents that do not have exact match of query terms even if they are semantically relevant to the query. In the modified model, we introduced a Query Relevance Update technique, which pads the original query set with semantically relevant document terms for optimised semantic retrieval results. The modified model also includes a novel tf−pwhich replaces the tf−idftechnique of the classical VSM, which is used to compute the Term Frequency weights. The replacement of the tf−idfresolves the problem of the classical model penalising terms that occur across documents with the assumption that they are stop words, which in practice, there are usually such words which carry relevant semantic information for documents’ retrieval. We also extended the cosine similarity function with a proportionality weight pqd, which moderates biases for high frequency of terms in longer documents. The pqdensures that the frequency of query terms including the updated ones are accounted for in proportionality with documents size for the overall ranking of documents. The simulated results reveal that, the modified VSM does achieve semantic retrieval of documents beyond lexical matching of query and document terms.
Databáze: Directory of Open Access Journals