A Survey on Content Based Crawling for Deep and Surface Web
Autor: | Nishchay Agrawal, Suchi Johari |
---|---|
Rok vydání: | 2019 |
Předmět: |
0209 industrial biotechnology
Computer science Search engine indexing 02 engineering and technology Bloom filter Crawling World Wide Web Deep Web Task (computing) 020901 industrial engineering & automation Web page 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Web crawler Semantic Web |
Zdroj: | 2019 Fifth International Conference on Image Information Processing (ICIIP). |
DOI: | 10.1109/iciip47207.2019.8985906 |
Popis: | The World Wide Web contains massive source of content. Fetching of relevant information from the WWW is a very typical task. Web crawler plays an important role to fetch the relevant content from the WWW and for indexing the web pages. To accommodate drastically increasing user requests, an efficient and optimized crawler is required. Content of the surface web pages are available to all users directly for access, but content of the deep web is not exposed to the users. The crawling of the hidden web is even more difficult. Authors have proposed algorithms for different web crawlers for fetching the information from the surface and deep web in an efficient and optimized manner. In this paper, we have reviewed different web crawlers and have classified them based on the information fetched by them. This paper provides a comparative analysis of web crawlers used for fetching the information based on URL, deep and surface web. |
Databáze: | OpenAIRE |
Externí odkaz: |