Web Object Mining Using Entropy Increasing Rate

Autor: Hong Ping Hu, Jiang Feng Ni, Liu Rui
Rok vydání: 2011
Předmět:
Zdroj: Advanced Materials Research. :2602-2606
ISSN: 1662-8985
DOI: 10.4028/www.scientific.net/amr.403-408.2602
Popis: In this paper, we proposed a new method of web objects extraction based on entropy theory, which takes both tag structure and content pattern into consideration for object detection. Firstly, it calculates content entropy of each node in HTML tag tree. Then, it uses entropy increasing rate to capture characteristics of object region and identify the minimal sub-tree that contains objects. Finally, a set of heuristics is employed for more accurate extraction. Experimental evaluation shows it can enhance the overall effectiveness of object mining.
Databáze: OpenAIRE