Query expansion using pseudo relevance feedback based on the bahasa version of the wikipedia dataset.

Autor: Husni, Kustiyahningsih, Yeni, Rachman, Fika Hastarita, Rochman, Eka Mala Sari, Yulian, Hadi, Kristiana, Arika Indah, Alfarisi, Ridho
Předmět:
Zdroj: AIP Conference Proceedings; 12/24/2022, Vol. 2679 Issue 1, p1-6, 6p
Abstrakt: The work of finding documents that are relevant to a user's query on an information retrieval system (IRS) is a very interesting study. The relevance of the list of documents returned by the IRS is influenced by the accuracy of the method of calculating the similarity between documents and the determination of the keywords. Many users are difficult to describe their information needs in words. Sometimes the user enters only one or two words that do not reflect the domain of information required. This results in a list of documents were very less relevant to the user's needs. The approach to improve the list of words in the user's query to make it more representative is called Query Expansion. One technique that can be used to expand a query is Pseudo Relevance Feedback. This paper describes the results of research that has been carried out to expand Query using Pseudo Relevance Feedback on an IRS based on the Indonesian version of the Wikipedia dataset, totaling about 450 thousand documents. Calculation of the similarity between the query and the list of tourism news documents uses the cosine similarity, while the weighting scheme for each term uses TF-IDF. The test results show that the pseudo-relevance feedback decreases the precision of the IRS up to 30%. This is due to the failure of the chosen approach to finding the right words to expand the original query. The abstract of articles in Wikipedia is general and is not limited to the tourism domain. The selection of the expansion base dataset is greatly determined by the new query quality and datasets from the same domain are recommended. It is highly recommended that the QE reference dataset is domain specific and filtered before being used as a QE basis. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index