Web Scraping: State-of-the-Art and Areas of Application

Autor: Mamadou Bousso, Ousmane Sall, Seny Ndiaye Mbaye, Babiga Birregah, Edouard Ngor Sarr, Rabiyatou Diouf
Přispěvatelé: Université de Thiès, Laboratoire Modélisation et Sûreté des Systèmes (LM2S), Institut Charles Delaunay (ICD), Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)-Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)
Rok vydání: 2019
Předmět:
Zdroj: IEEE BigData
2019 IEEE International Conference on Big Data (Big Data)
2019 IEEE International Conference on Big Data (Big Data), Dec 2019, Los Angeles, United States. pp.6040-6042, ⟨10.1109/BigData47090.2019.9005594⟩
DOI: 10.1109/bigdata47090.2019.9005594
Popis: International audience; Main objective of Web Scraping is to extract information from one or many websites and process it into simple structures such as spreadsheets, database or CSV file. However, in addition to be a very complicated task, Web Scraping is resource and time consuming, mainly when it is carried out manually. Previous studies have developed several automated solutions. The purpose of this article is to revisit the different existing Web Scraping approaches, categories, and tools, but also its areas of application.
Databáze: OpenAIRE