Web Scraping: State-of-the-Art and Areas of Application
Autor: | Mamadou Bousso, Ousmane Sall, Seny Ndiaye Mbaye, Babiga Birregah, Edouard Ngor Sarr, Rabiyatou Diouf |
---|---|
Přispěvatelé: | Université de Thiès, Laboratoire Modélisation et Sûreté des Systèmes (LM2S), Institut Charles Delaunay (ICD), Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)-Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS) |
Rok vydání: | 2019 |
Předmět: |
0303 health sciences
Computer science business.industry Process (engineering) 0206 medical engineering 02 engineering and technology Python (programming language) computer.software_genre Task (project management) World Wide Web 03 medical and health sciences Information extraction Resource (project management) Web page [INFO]Computer Science [cs] The Internet business computer 020602 bioinformatics Web scraping 030304 developmental biology computer.programming_language |
Zdroj: | IEEE BigData 2019 IEEE International Conference on Big Data (Big Data) 2019 IEEE International Conference on Big Data (Big Data), Dec 2019, Los Angeles, United States. pp.6040-6042, ⟨10.1109/BigData47090.2019.9005594⟩ |
DOI: | 10.1109/bigdata47090.2019.9005594 |
Popis: | International audience; Main objective of Web Scraping is to extract information from one or many websites and process it into simple structures such as spreadsheets, database or CSV file. However, in addition to be a very complicated task, Web Scraping is resource and time consuming, mainly when it is carried out manually. Previous studies have developed several automated solutions. The purpose of this article is to revisit the different existing Web Scraping approaches, categories, and tools, but also its areas of application. |
Databáze: | OpenAIRE |
Externí odkaz: |