Web Object Mining Using Entropy Increasing Rate
Autor: | Hong Ping Hu, Jiang Feng Ni, Liu Rui |
---|---|
Rok vydání: | 2011 |
Předmět: |
business.industry
Computer science Entropy (statistical thermodynamics) General Engineering Pattern recognition computer.software_genre HTML element Object detection Information extraction Entropy (classical thermodynamics) Entropy (information theory) Data mining Artificial intelligence Entropy (energy dispersal) Heuristics business computer Entropy (arrow of time) Entropy (order and disorder) Web object |
Zdroj: | Advanced Materials Research. :2602-2606 |
ISSN: | 1662-8985 |
DOI: | 10.4028/www.scientific.net/amr.403-408.2602 |
Popis: | In this paper, we proposed a new method of web objects extraction based on entropy theory, which takes both tag structure and content pattern into consideration for object detection. Firstly, it calculates content entropy of each node in HTML tag tree. Then, it uses entropy increasing rate to capture characteristics of object region and identify the minimal sub-tree that contains objects. Finally, a set of heuristics is employed for more accurate extraction. Experimental evaluation shows it can enhance the overall effectiveness of object mining. |
Databáze: | OpenAIRE |
Externí odkaz: |