Statistical analysis of a hierarchical clustering algorithm with outliers

Autor: Nicolas Klutchnikoff, Audrey Poterie, Laurent Rouvière
Přispěvatelé: Université de Bretagne Sud (UBS), Institut de Recherche Mathématique de Rennes (IRMAR), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut Agro Rennes Angers, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Rok vydání: 2022
Předmět:
Zdroj: Journal of Multivariate Analysis
Journal of Multivariate Analysis, 2022, 192, pp.article n° 105075. ⟨10.1016/j.jmva.2022.105075⟩
ISSN: 0047-259X
1095-7243
DOI: 10.48550/arxiv.2203.09781
Popis: International audience; It is well known that the classical single linkage algorithm usually fails to identify clusters in the presence of outliers. In this paper, we propose a new version of this algorithm, and we study its mathematical performances. In particular, we establish an oracle type inequality which ensures that our procedure allows to recover the clusters with large probability under minimal assumptions on the distribution of the outliers. We deduce from this inequality the consistency and some rates of convergence of our algorithm for various situations. Performances of our approach is also assessed through simulation studies and a comparison with classical clustering algorithms on simulated data is also presented.
Databáze: OpenAIRE