Query Expansion Using Term Distribution and Term Association
Autor: | Dipasree Pal, Kalyankumar Datta, Mandar Mitra |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Distribution (number theory) Computer science Association (object-oriented programming) computer.software_genre Computer Science - Information Retrieval Term (time) Set (abstract data type) Query expansion Language model Data mining Divergence (statistics) computer Information Retrieval (cs.IR) Selection (genetic algorithm) |
Zdroj: | Second Workshop on Women in Data Science. |
Popis: | Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are based on Kullback-Leibler Divergence (KLD) and Bose-Einstein statistics (Bo1). Association based term selection, on the other hand, uses information about how a candidate term co-occurs with the original query terms. Local Context Analysis (LCA) and Relevance-based Language Model (RM3) are examples of association-based methods. Our goal in this study is to investigate how these two classes of methods may be combined to improve retrieval effectiveness. We propose the following combination-based approach. Candidate expansion terms are first obtained using a distribution based method. This set is then refined based on the strength of the association of terms with the original query terms. We test our methods on 11 TREC collections. The proposed combinations generally yield better results than each individual method, as well as other state-of-the-art AQE approaches. En route to our primary goal, we also propose some modifications to LCA and Bo1 which lead to improved performance. Comment: 19 pages, 1 figure, 2 result tables |
Databáze: | OpenAIRE |
Externí odkaz: |