A Method of Efficient Web Crawling Using URL Pattern Scripts

Autor: June-Young Jung, Moon-Soo Chang
Rok vydání: 2007
Předmět:
Zdroj: Journal of Korean Institute of Intelligent Systems. 17:849-854
ISSN: 1976-9172
DOI: 10.5391/jkiis.2007.17.6.849
Popis: It is difficult that we collect only target documents from the Innumerable Web documents. One of solution to the problem is that we select target documents on the Web site which services many documents of target domain. In this paper, we will propose an intelligent crawling method collecting needed documents based on URL pattern script defined by XML. Proposed crawling method will efficiently apply to the sites which service structuralized information of a piece with database. In this paper, we collected 50 thousand Web documents using our crawling method.
Databáze: OpenAIRE