Query Expansion Using Term Distribution and Term Association

Autor: Dipasree Pal, Kalyankumar Datta, Mandar Mitra
Rok vydání: 2018
Předmět:
Zdroj: Second Workshop on Women in Data Science.
Popis: Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are based on Kullback-Leibler Divergence (KLD) and Bose-Einstein statistics (Bo1). Association based term selection, on the other hand, uses information about how a candidate term co-occurs with the original query terms. Local Context Analysis (LCA) and Relevance-based Language Model (RM3) are examples of association-based methods. Our goal in this study is to investigate how these two classes of methods may be combined to improve retrieval effectiveness. We propose the following combination-based approach. Candidate expansion terms are first obtained using a distribution based method. This set is then refined based on the strength of the association of terms with the original query terms. We test our methods on 11 TREC collections. The proposed combinations generally yield better results than each individual method, as well as other state-of-the-art AQE approaches. En route to our primary goal, we also propose some modifications to LCA and Bo1 which lead to improved performance.
Comment: 19 pages, 1 figure, 2 result tables
Databáze: OpenAIRE