Web Scraping: State-of-the-Art and Areas of Application

Autor:	Mamadou Bousso, Ousmane Sall, Seny Ndiaye Mbaye, Babiga Birregah, Edouard Ngor Sarr, Rabiyatou Diouf
Přispěvatelé:	Université de Thiès, Laboratoire Modélisation et Sûreté des Systèmes (LM2S), Institut Charles Delaunay (ICD), Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)-Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)
Rok vydání:	2019
Předmět:	0303 health sciences Computer science business.industry Process (engineering) 0206 medical engineering 02 engineering and technology Python (programming language) computer.software_genre Task (project management) World Wide Web 03 medical and health sciences Information extraction Resource (project management) Web page [INFO]Computer Science [cs] The Internet business computer 020602 bioinformatics Web scraping 030304 developmental biology computer.programming_language
Zdroj:	IEEE BigData 2019 IEEE International Conference on Big Data (Big Data) 2019 IEEE International Conference on Big Data (Big Data), Dec 2019, Los Angeles, United States. pp.6040-6042, ⟨10.1109/BigData47090.2019.9005594⟩
DOI:	10.1109/bigdata47090.2019.9005594
Popis:	International audience; Main objective of Web Scraping is to extract information from one or many websites and process it into simple structures such as spreadsheets, database or CSV file. However, in addition to be a very complicated task, Web Scraping is resource and time consuming, mainly when it is carried out manually. Previous studies have developed several automated solutions. The purpose of this article is to revisit the different existing Web Scraping approaches, categories, and tools, but also its areas of application.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::48ddd5baeb3be495f10663f064db06d9 https://doi.org/10.1109/bigdata47090.2019.9005594 Zobrazit plný text záznamu