Adaptive kernel fuzzy clustering for missing data.
Autor: | Rodrigues AKG; Departamento de Estatística, CASTLab, CCEN, Universidade Federal de Pernambuco, Cidade Universitária, Recife, PE, Brazil., Ospina R; Departamento de Estatística, CASTLab, CCEN, Universidade Federal de Pernambuco, Cidade Universitária, Recife, PE, Brazil., Ferreira MRP; Departamento de Estatística, DataLab, Centro de Ciências Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, PB, Brazil. |
---|---|
Jazyk: | angličtina |
Zdroj: | PloS one [PLoS One] 2021 Nov 12; Vol. 16 (11), pp. e0259266. Date of Electronic Publication: 2021 Nov 12 (Print Publication: 2021). |
DOI: | 10.1371/journal.pone.0259266 |
Abstrakt: | Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values. Competing Interests: The authors have declared that no competing interests exist. |
Databáze: | MEDLINE |
Externí odkaz: |