Kernel conditional clustering and kernel conditional semi-supervised learning

Autor:	Xiao He, Damian Roqueiro, Karsten M. Borgwardt, Thomas Gumbsch
Rok vydání:	2019
Předmět:	Alternative clustering Computer science Conditional semi-supervised learning 02 engineering and technology Semi-supervised learning Label propagation Machine learning computer.software_genre Measure (mathematics) Set (abstract data type) Artificial Intelligence 020204 information systems Covariate 0202 electrical engineering electronic engineering information engineering Cluster analysis Ground truth Conditional dependence business.industry Human-Computer Interaction ComputingMethodologies_PATTERNRECOGNITION Hardware and Architecture Kernel (statistics) Conditional dependence measure Conditional clustering Artificial intelligence business computer Software Information Systems
Zdroj:	Knowledge and Information Systems Knowledge and information systems, 62 (3)
ISSN:	0219-3116 0219-1377
DOI:	10.1007/s10115-019-01334-5
Popis:	The results of clustering are often affected by covariates that are independent of the clusters one would like to discover. Traditionally, alternative clustering algorithms can be used to solve such clustering problems. However, these suffer from at least one of the following problems: (1) Continuous covariates or nonlinearly separable clusters cannot be handled; (2) assumptions are made about the distribution of the data; (3) one or more hyper-parameters need to be set. The presence of covariates also has an effect in a different type of problem such as semi-supervised learning. To the best of our knowledge, there is no existing method addressing the semi-supervised learning setting in the presence of covariates. Here we propose two novel algorithms, named kernel conditional clustering (KCC) and kernel conditional semi-supervised learning (KCSSL), whose objectives are derived from a kernel-based conditional dependence measure. KCC is parameter-light and makes no assumptions about the cluster structure, the covariates, or the distribution of the data, while KCSSL is fully parameter-free. On both simulated and real-world datasets, the proposed KCC and KCSSL algorithms perform better than state-of-the-art methods. The former detects the ground truth cluster structures more accurately, and the latter makes more accurate predictions. Knowledge and information systems, 62 (3) ISSN:0219-1377 ISSN:0219-3116
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d46be02b180eb74ce9b4012ac73b85b3 https://doi.org/10.1007/s10115-019-01334-5 Zobrazit plný text záznamu Full text from SpringerLink