Web Page Segmentation for Non Visual Skimming
Autor: | Judith Jeyafreeda Andrew, Stéphane Ferrari, Fabrice Maurel, Gaël Dias, Emmanuel Giguet |
---|---|
Přispěvatelé: | Equipe Hultech - Laboratoire GREYC - UMR6072, Groupe de Recherche en Informatique, Image et Instrumentation de Caen (GREYC), Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Normandie Université (NU)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU), Emmanuel, Giguet |
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
[INFO.INFO-WB] Computer Science [cs]/Web [INFO.INFO-WB]Computer Science [cs]/Web [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing [INFO]Computer Science [cs] [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] [INFO] Computer Science [cs] [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC] |
Zdroj: | The 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33) The 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33), Sep 2019, Hakodate, Japan HAL |
ISSN: | 2619-7782 |
Popis: | International audience; Web page segmentation aims to break a page into smaller blocks, in which contents with coherent semantics are kept together. Examples of tasks targeted by such a technique are advertisement detection or main content extraction. In this paper, we study different seg-mentation strategies for the task of non visual skimming. For that purpose, we consider web page segmentation as a clustering problem of visual elements, where (1) all visual elements must be clustered, (2) a fixed number of clusters must be discovered, and (3) the elements of a cluster should be visually connected. Therefore, we study three different algorithms that comply to these constraints: K-means, F-K-means, and Guided Expansion. Evaluation shows that Guided Expansion evidences statistically-relevant results in terms of compactness and separateness, and satisfies more logical constraints when compared to the other strategies. |
Databáze: | OpenAIRE |
Externí odkaz: |