Modelling the role of variables in model-based cluster analysis
Autor: | Annamaria Manisi, Giuliano Galimberti, Gabriele Soffritti |
---|---|
Přispěvatelé: | Galimberti, Giuliano, Manisi, Annamaria, Soffritti, Gabriele |
Rok vydání: | 2017 |
Předmět: |
Statistics and Probability
Variable selection media_common.quotation_subject Feature selection 02 engineering and technology 01 natural sciences Theoretical Computer Science 010104 statistics & probability 0202 electrical engineering electronic engineering information engineering 0101 mathematics EM algorithm Cluster analysis Mathematics media_common Variables Partial residual plot business.industry Model selection Pattern recognition Mixture model Variable (computer science) Clusterwise linear regression Gaussian mixture model Genetic algorithm Computational Theory and Mathematics Multiple cluster structure Identifiability 020201 artificial intelligence & image processing Artificial intelligence Statistics Probability and Uncertainty business Algorithm |
Zdroj: | Statistics and Computing. 28:145-169 |
ISSN: | 1573-1375 0960-3174 |
DOI: | 10.1007/s11222-017-9723-0 |
Popis: | In the framework of cluster analysis based on Gaussian mixture models, it is usually assumed that all the variables provide information about the clustering of the sample units. Several variable selection procedures are available in order to detect the structure of interest for the clustering when this structure is contained in a variable sub-vector. Currently, in these procedures a variable is assumed to play one of (up to) three roles: (1) informative, (2) uninformative and correlated with some informative variables, (3) uninformative and uncorrelated with any informative variable. A more general approach for modelling the role of a variable is proposed by taking into account the possibility that the variable vector provides information about more than one structure of interest for the clustering. This approach is developed by assuming that such information is given by non-overlapped and possibly correlated sub-vectors of variables; it is also assumed that the model for the variable vector is equal to a product of conditionally independent Gaussian mixture models (one for each variable sub-vector). Details about model identifiability, parameter estimation and model selection are provided. The usefulness and effectiveness of the described methodology are illustrated using simulated and real datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |