Výsledky vyhledávání - "web data extraction"

Akademický článek

Automatic Regular Expression Generation for Extracting Relevant Image Data From Web Pages Using Genetic Algorithms

Autor: Canan Aslanyurek, Tarik Yerlikaya

Publikováno v: IEEE Access, Vol 12, Pp 90660-90669 (2024)

In this study, a method that automatically generates regular expressions using genetic algorithms is designed to extract relevant images on web pages. Data extraction, which is usually done with web scrapers, can also be done with regular expressions

Externí odkaz: https://doaj.org/article/2de0b97f82cb42f59c6c8fe9e494adb6

Zobrazit plný text záznamu

Akademický článek

Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks

Autor: Sudhir Kumar Patnaik, C. Narendra Babu, Mukul Bhave

Publikováno v: Big Data Mining and Analytics, Vol 4, Iss 4, Pp 279-297 (2021)

Data are crucial to the growth of e-commerce in today’s world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies. However, core data extraction engines fail because they cannot

Externí odkaz: https://doaj.org/article/a826c864be2d4aa081bc5069b7b2b972

Zobrazit plný text záznamu

Akademický článek

VB-PTC: Visual Block Multi-Record Text Extraction Based on Sensor Network Page Type Conversion

Autor: Jibing Gong, Hekai Zhang, Weixia Du, Huanhuan Li, Hongnian Wen

Publikováno v: IEEE Access, Vol 8, Pp 167900-167913 (2020)

Usually, in addition to the main content, web pages contain additional information in the form of noise, such as navigation elements, sidebars and advertisements. This kind of noise has nothing to do with the main content, it will affect the tasks of

Externí odkaz: https://doaj.org/article/8babdddbd9b041039b31a535c593d149

Zobrazit plný text záznamu

Akademický článek

Aesthetic Trends and Semantic Web Adoption of Media Outlets Identified through Automated Archival Data Extraction

Autor: Aristeidis Lamprogeorgos, Minas Pergantis, Michail Panagopoulos, Andreas Giannakoulopoulos

Publikováno v: Future Internet, Vol 14, Iss 7, p 204 (2022)

The last decade has been a time of great progress in the World Wide Web and this progress has manifested in multiple ways, including both the diffusion and expansion of Semantic Web technologies and the advancement of the aesthetics and usability of

Externí odkaz: https://doaj.org/article/99a4bc972ba148a183c33ee4498f4195

Zobrazit plný text záznamu

Plný text ve formátu HTML

Akademický článek

Investigating the Country of Origin and the Role of the .eu TLD in External Trade of European Union Member States

Autor: Andreas Giannakoulopoulos, Minas Pergantis, Laida Limniati, Alexandros Kouretsis

Publikováno v: Future Internet, Vol 14, Iss 6, p 174 (2022)

The Internet, and specifically the World Wide Web, has always been a useful tool in the effort to achieve more outward-looking economies. The launch of the .eu TLD (top-level domain) in December of 2005 introduced the concept of a pan-European Intern

Externí odkaz: https://doaj.org/article/9953b89024564d4b87e1abde30fc2b03

Zobrazit plný text záznamu

Plný text ve formátu HTML

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

An Improved Approach for Deep Web Data Extraction

Autor: Deshmukh Shilpa, Karde P.P., Thakare V.R.

Publikováno v: ITM Web of Conferences, Vol 40, p 03045 (2021)

The World Wide Web is a valuable wellspring of data which contains information in a wide range of organizations. The different organizations of pages go about as a boundary for performing robotized handling. Numerous business associations require inf

Externí odkaz: https://doaj.org/article/c90a4135b0964ee9a6f08f31bb29c339

Zobrazit plný text záznamu

Dissertation/ Thesis

The One Spider To Rule Them All : Web Scraping Simplified: Improving Analyst Productivity and Reducing Development Time with A Generalized Spider

Autor: Johansson, Rikard

This thesis addresses the process of developing a generalized spider for web scraping, which can be applied to multiple sources, thereby reducing the time and cost involved in creating and maintaining individual spiders for each website or URL. The p

Externí odkaz: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-329799

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání