A new approach to Web Crawling — DHEKTS Crawler in comparison with various Crawlers

Autor: K ThirugnanaSambanthan, Commerce, Coimbatore, Tamil Nadu, India.
Rok vydání: 2021
Předmět:
Zdroj: Indian Journal of Science and Technology. 14:1580-1586
ISSN: 0974-5645
0974-6846
DOI: 10.17485/ijst/v14i19.599
Popis: Objectives: To propose a crawler to visit websites for collecting information and create a search engine index for reference; To compare various crawler License, language used for creation, effectiveness with proposed DHEKTS crawler; To compare various characteristics, tasks and functions with proposed DHEKTS crawler; To identify the merits of the DHEKTS Crawler. Methods: A new Crawler called DHEKTS is developed to filter and synchronize documents like Images, Link, and HTML code from a given website. This Crawler is unique in nature since it returns all the details of a particular website having Images, Links, html code and contents. It can crawl through links in a specified website and crawl further to other links on the website. The DHEKTS Crawler is designed for Depth and Relevance crawling. The entire DHEKTS crawler has a few crawling mechanism supporting variety of information. The requirements are Operating System: Win 7 and higher, Front End: PHP, BackEnd: MySQL, RAM: Minimum 4GB and SERVER: High Speed Server with good storage Capacity. Findings: The DHEKTS Crawler has brought web related Links, Images, HTML Code, Information about to fifth level of crawling and Relevance Search giving relevant information. Multiple crawlers fulfill the major functions of crawling but DHEKTS CRAWLER is built to execute all functions in one crawler. Applications: This is applied in Crawling of various Websites and to retrieve valuable data. Keywords: Crawler; DHEKTS Crawler; License; tasks; functions; effectiveness; Comparison
Databáze: OpenAIRE