A New Compression Based Index Structure for Efficient Information Retrieval

Autor: Mamun, Md. Abdullah al, Hanif, Md., Uddin, Md. Rakib, Ahmed, Tanvir, Islam, Md. Mofizul
Rok vydání: 2012
Předmět:
Zdroj: International Journal of Science and Technology, Volume 2 No.1, pp. 10-14, January 2012
Druh dokumentu: Working Paper
Popis: Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR system. Now a day exponential growth of information makes the index structure large enough affecting the IR system's quality. So compressing the Index structure is our main contribution in this paper. We compressed the document number in inverted file entries using a new coding technique based on run-length encoding. Our coding mechanism uses a specified code which acts over run-length coding. We experimented and found that our coding mechanism on an average compresses 67.34% percent more than the other techniques.
Comment: 5 pages
Databáze: arXiv