Minimization of the Disagreements in Clustering Aggregation

Autor: Baroudi Rouba, Safia Nait Bahloul, Youssef Amghar
Přispěvatelé: Département d'Informatique [Oran], Université des sciences et de la Technologie d'Oran Mohamed Boudiaf [Oran] (USTO MB), Service Oriented Computing (SOC), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2)
Rok vydání: 2008
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783540859291
ICIC (3)
International Conference on Intelligent Computing
International Conference on Intelligent Computing, Sep 2008, Shangai, China. pp.517-524, ⟨10.1007/978-3-540-85930-7⟩
DOI: 10.1007/978-3-540-85930-7_66
Popis: International audience; Abstract: Several experiences proved the impact of the choice of the parts of documents selected on the result of the classification and consequently on the number of requests which can answer these clusters. The process of aggregation gives a very natural method of data classification and considers then m produced classifications by them m attributes and tries to produce a classification called "optimal" which is the most close possible of m classifications. The optimization consists in minimizing the number of pairs of objects (u, v) such as a C classification place them in the same cluster whereas another C' classification place them in different clusters. This number corresponds to the concept of disagreements. We propose an approach which exploits the various elements of an XML document participating in various views to give different classifications. These classifications are then aggregated in the only one classification minimizing the number of disagreements. Our approach is divided into two steps: the first consists in applying the K-means algorithm on the collection of XML documents by considering every time a different element from the document. Second step aggregates the various classifications obtained previously to produce the one that minimizes the number of disagreements.
Databáze: OpenAIRE