Transforming XML Trees for Efficient Classification and Clustering
Autor: | Laurent Candillier, Isabelle Tellier, Fabien Torre |
---|---|
Rok vydání: | 2006 |
Předmět: |
Document Structure Description
Information retrieval Computer science Efficient XML Interchange XML Signature XML validation computer.file_format computer.software_genre ComputingMethodologies_PATTERNRECOGNITION XML database XML Schema Editor ComputingMethodologies_DOCUMENTANDTEXTPROCESSING XML schema computer XML Catalog computer.programming_language |
Zdroj: | Lecture Notes in Computer Science ISBN: 3540349626 Lecture Notes in Computer Science ISBN: 9783540349624 INEX |
DOI: | 10.1007/11766278_36 |
Popis: | Most of the existing methods we know to tackle datasets of XML documents directly work on the trees representing these XML documents. We investigate in this paper the use of a different kind of representation for the manipulation of XML documents. Our idea is to transform the trees into sets of attribute-values, so as to be able to apply various existing methods of classification and clustering on such data, and benefit from their strengths. We apply this strategy both for the classification task and for the clustering task using the structural description of XML documents alone. For instance, we show that the use of boosted C5 leads to very good results in the classification task of XML documents transformed in this way. The use of SSC in the clustering task benefits from its ability to provide as output an interpretable representation of the clusters found. Finally, we also propose an adaptation of SSC for the classification of XML documents, so that the produced classifier is understandable. |
Databáze: | OpenAIRE |
Externí odkaz: |