Clustering dependent observations with copula functions
Autor: | Di Lascio, F., Marta, L., GIANNERINI, SIMONE |
---|---|
Přispěvatelé: | Di Lascio, F., Marta, L., Giannerini, S. |
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Statistics and Probability
Fuzzy clustering 05 social sciences Correlation clustering Monte Carlo method 050401 social sciences methods Probability and statistics computer.software_genre 01 natural sciences Clustering Copula (probability theory) 010104 statistics & probability 0504 sociology CURE data clustering algorithm Canopy clustering algorithm Multivariate dependence structure Biological tumor sample Data mining 0101 mathematics Statistics Probability and Uncertainty Cluster analysis computer Copula function Mathematics |
Popis: | This paper deals with the problem of clustering dependent observations according to their underlying complex generating process. Di Lascio and Giannerini (Journal of Classification 29(1):50–75, 2012) introduced the CoClust, a clustering algorithm based on copula function that achieves the task but has a high computational burden. Moreover, the CoClust automatically allocates all the observations to the clusters; thus, it cannot discard potentially irrelevant observations. In this paper we introduce an improved version of the CoClust that both overcomes these issues and performs better in many respects. By means of a Monte Carlo study we investigate the features of the algorithm and show that it improves consistently with respect to the old CoClust. The validity of our proposal is also supported by applications to real data sets of human breast tumor samples for which the algorithm provides a meaningful biological interpretation. The new algorithm is implemented and made available through an updated version of the R package CoClust. |
Databáze: | OpenAIRE |
Externí odkaz: |