Autor: |
Alexander H. Foss, Marianthi Markatou |
Jazyk: |
angličtina |
Rok vydání: |
2018 |
Předmět: |
|
Zdroj: |
Journal of Statistical Software, Vol 83, Iss 1, Pp 1-44 (2018) |
Druh dokumentu: |
article |
ISSN: |
1548-7660 |
DOI: |
10.18637/jss.v083.i13 |
Popis: |
In this paper we discuss the challenge of equitably combining continuous (quantitative) and categorical (qualitative) variables for the purpose of cluster analysis. Existing techniques require strong parametric assumptions, or difficult-to-specify tuning parameters. We describe the kamila package, which includes a weighted k-means approach to clustering mixed-type data, a method for estimating weights for mixed-type data (ModhaSpangler weighting), and an additional semiparametric method recently proposed in the literature (KAMILA). We include a discussion of strategies for estimating the number of clusters in the data, and describe the implementation of one such method in the current R package. Background and usage of these clustering methods are presented. We then show how the KAMILA algorithm can be adapted to a map-reduce framework, and implement the resulting algorithm using Hadoop for clustering very large mixed-type data sets. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|