Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems

Autor:	Jean Jyh-Jiun Shann, Cher-Sheng Cheng, Chung-Ping Chung
Rok vydání:	2006
Předmět:	Document Identifier Information retrieval Computer science Search engine indexing ComputerApplications_COMPUTERSINOTHERSYSTEMS Library and Information Sciences Management Science and Operations Research Inverted index computer.software_genre Computer Science Applications Identifier Media Technology Data mining Cluster analysis computer Algorithm Decoding methods Information Systems Coding (social sciences) Data compression
Zdroj:	Information Processing & Management. 42:407-428
ISSN:	0306-4573
DOI:	10.1016/j.ipm.2005.02.002
Popis:	This paper presents a size reduction method for the inverted file, the most suitable indexing structure for an information retrieval system (IRS). We notice that in an inverted file the document identifiers for a given word are usually clustered. While this clustering property can be used in reducing the size of the inverted file, good compression as well as fast decompression must both be available. In this paper, we present a method that can facilitate coding and decoding processes for interpolative coding using recursion elimination and loop unwinding. We call this method the unique-order interpolative coding. It can calculate the lower and upper bounds of every document identifier for a binary code without using a recursive process, hence the decompression time can be greatly reduced. Moreover, it also can exploit document identifier clustering to compress the inverted file efficiently. Compared with the other well-known compression methods, our method provides fast decoding speed and excellent compression. This method can also be used to support a self-indexing strategy. Therefore our research work in this paper provides a feasible way to build a fast and space-economical IRS.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::55eeee0016793c50f0bc3bab0f5292ad https://doi.org/10.1016/j.ipm.2005.02.002 Zobrazit plný text záznamu Full Text from ScienceDirect