Popis: |
In this paper, we investigate the automatic classification of XML documents into predefined categories. We propose to develop a classification model by combining the content and structure of documents. Furthermore, we propose to use semantic resources, specifically WordNet and ontology linked to the terms of the corpus, in order to model the notion of the semantic neighborhood by using a calculation regarding the similarity between terms.To validate the results, we used the INEX 2007 XML corpus. |