CBDIR: Fast and effective content based document Information Retrieval system

Autor:	Jae Hee Ha, Kyung-Ah Sohn, Young-June Choi, Moon Soo Cha, Soyeon Kim, Min-June Lee
Rok vydání:	2015
Předmět:	Information retrieval Database Computer science Human–computer information retrieval Search engine indexing Vector space model Relevance (information retrieval) Visual Word Document retrieval Document clustering computer.software_genre computer Adversarial information retrieval
Zdroj:	ICIS
DOI:	10.1109/icis.2015.7166594
Popis:	The continuing growth of information overflow has made it hard to obtain valuable information on the web. In this trend, the need for effective Information Retrieval (IR) technique has been increased. Although document data contain much more abundant information, users can retrieve necessary information only from the title and description in conventional web services. In order to meet the demands for fast and accurate retrieval of valuable information, we propose a fast and effective content-based document information retrieval system that retrieves the information from the actual content of a document. The proposed method is based on a topic model of Latent Dirichlet Allocation that is used to extract major keywords for a given document. The main contributions of our system are the increased flexibility, effectiveness, and fast retrieval of information. Our system can easily communicate with existing web service through the standard JSON format. In addition, we increase the speed of information retrieval by using NoSQL based database system with inverted indexing and B-tree based indexing. We validate the performance of our system on real data collected from the SlideShare service. The proposed system shows better retrieval performance over the existing IR system.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::371504a4769f4eb1294863ff99b67272 https://doi.org/10.1109/icis.2015.7166594 Zobrazit plný text záznamu