Multiple kernel learning for integrative consensus clustering of omic datasets

Autor: Alessandra Cabassi, Paul D. W. Kirk
Přispěvatelé: Cabassi, Alessandra [0000-0003-1605-652X], Kirk, Paul [0000-0002-5931-7489], Apollo - University of Cambridge Repository
Jazyk: angličtina
Předmět:
Statistics and Probability
FOS: Computer and information sciences
Computer Science - Machine Learning
Consensus
AcademicSubjects/SCI01060
Computer science
Information Storage and Retrieval
Context (language use)
Machine Learning (stat.ML)
Machine learning
computer.software_genre
Biochemistry
Statistics - Applications
Machine Learning (cs.LG)
Methodology (stat.ME)
03 medical and health sciences
Kernel (linear algebra)
0302 clinical medicine
Robustness (computer science)
Statistics - Machine Learning
Neoplasms
Consensus clustering
Cluster Analysis
Humans
Applications (stat.AP)
Cluster analysis
Molecular Biology
Statistics - Methodology
030304 developmental biology
0303 health sciences
Multiple kernel learning
business.industry
Systems Biology
Original Papers
Computer Science Applications
Computational Mathematics
ComputingMethodologies_PATTERNRECOGNITION
Computational Theory and Mathematics
030220 oncology & carcinogenesis
Kernel (statistics)
Benchmark (computing)
Artificial intelligence
business
computer
Algorithms
Zdroj: Bioinformatics
ISSN: 1460-2059
1367-4803
DOI: 10.1093/bioinformatics/btaa593
Popis: Diverse applications - particularly in tumour subtyping - have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster-Of-Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets, or datasets that define conflicting clustering structures, is unclear. We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. R packages "klic" and "coca" are available on the Comprehensive R Archive Network.
Comment: Manuscript: 18 pages, 6 figures. Supplement: 29 pages, 19 figures. This version contains additional simulation studies and comparisons to other methods. For associated R code, see https://CRAN.R-project.org/package=klic and https://github.com/acabassi/klic-pancancer-analysis
Databáze: OpenAIRE