Robust clustering tools based on optimal transportation
Autor: | E. del Barrio, Juan A. Cuesta-Albertos, Carlos Matrán, Agustín Mayo-Iscar |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Statistics and Probability Computer science Computation 010103 numerical & computational mathematics 01 natural sciences Theoretical Computer Science Methodology (stat.ME) Maxima and minima 010104 statistics & probability Computational Theory and Mathematics Robustness (computer science) Trimming 0101 mathematics Statistics Probability and Uncertainty Cluster analysis Algorithm Statistics - Methodology |
Zdroj: | Statistics and Computing. 29:139-160 |
ISSN: | 1573-1375 0960-3174 |
Popis: | A robust clustering method for probabilities in Wasserstein space is introduced. This new ‘trimmed k-barycenters’ approach relies on recent results on barycenters in Wasserstein space that allow intensive computation, as required by clustering algorithms to be feasible. The possibility of trimming the most discrepant distributions results in a gain in stability and robustness, highly convenient in this setting. As a remarkable application, we consider a parallelized clustering setup in which each of m units processes a portion of the data, producing a clustering report, encoded as k probabilities. We prove that the trimmed k-barycenter of the $$m\times k$$ reports produces a consistent aggregation which we consider the result of a ‘wide consensus’. We also prove that a weighted version of trimmed k-means algorithms based on k-barycenters in the space of Wasserstein keeps the descending character of the concentration step, guaranteeing convergence to local minima. We illustrate the methodology with simulated and real data examples. These include clustering populations by age distributions and analysis of cytometric data. |
Databáze: | OpenAIRE |
Externí odkaz: |