Web Scraping Tool For Newspapers And Images Data Using Jsonify

Autor: Qingli Niu, Irfan Ali Kandhro, Anil Kumar, Shahnawaz shah, Muhammad Hasan, Hifza Mehfooz Ahmed, Fei Liang
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Journal of Applied Science and Engineering, Vol 26, Iss 4, Pp 465-474 (2022)
Druh dokumentu: article
ISSN: 2708-9967
2708-9975
DOI: 10.6180/jase.202304_26(4).0002
Popis: Web scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure of web page, 2) design regular expression pattern and finally use that pattern to get certain data. In this paper, we also used Flask, Request, JSONify library to get the data, after processing, the data is transformed into the JSON form and ready for CSV with help of API. After generated all required regex patterns, the system uses these patterns as a set of rules, and with this, designed scraper tool works efficiently, and achieved outstanding results with help of support libraries to storing and extracting the news and web-based information. The proposed Web scraping tool eliminates the time and effort of manually collecting or copying data by automating the process. It is found that this designed scraper is easy and direct approach to extract the newspapers, websites, blogs, and images data.
Databáze: Directory of Open Access Journals