Minimization of the Disagreements in Clustering Aggregation
Autor: | Baroudi Rouba, Safia Nait Bahloul, Youssef Amghar |
---|---|
Přispěvatelé: | Département d'Informatique [Oran], Université des sciences et de la Technologie d'Oran Mohamed Boudiaf [Oran] (USTO MB), Service Oriented Computing (SOC), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2) |
Rok vydání: | 2008 |
Předmět: |
Computer science
computer.internet_protocol Process (engineering) Data classification 02 engineering and technology computer.software_genre ComputingMethodologies_PATTERNRECOGNITION 020204 information systems 0202 electrical engineering electronic engineering information engineering Library classification [INFO]Computer Science [cs] 020201 artificial intelligence & image processing Data mining Minification Element (category theory) Cluster analysis computer XML |
Zdroj: | Communications in Computer and Information Science ISBN: 9783540859291 ICIC (3) International Conference on Intelligent Computing International Conference on Intelligent Computing, Sep 2008, Shangai, China. pp.517-524, ⟨10.1007/978-3-540-85930-7⟩ |
DOI: | 10.1007/978-3-540-85930-7_66 |
Popis: | International audience; Abstract: Several experiences proved the impact of the choice of the parts of documents selected on the result of the classification and consequently on the number of requests which can answer these clusters. The process of aggregation gives a very natural method of data classification and considers then m produced classifications by them m attributes and tries to produce a classification called "optimal" which is the most close possible of m classifications. The optimization consists in minimizing the number of pairs of objects (u, v) such as a C classification place them in the same cluster whereas another C' classification place them in different clusters. This number corresponds to the concept of disagreements. We propose an approach which exploits the various elements of an XML document participating in various views to give different classifications. These classifications are then aggregated in the only one classification minimizing the number of disagreements. Our approach is divided into two steps: the first consists in applying the K-means algorithm on the collection of XML documents by considering every time a different element from the document. Second step aggregates the various classifications obtained previously to produce the one that minimizes the number of disagreements. |
Databáze: | OpenAIRE |
Externí odkaz: |