Multiple kernel learning for integrative consensus clustering of omic datasets

Autor:	Alessandra Cabassi, Paul D. W. Kirk
Přispěvatelé:	Cabassi, Alessandra [0000-0003-1605-652X], Kirk, Paul [0000-0002-5931-7489], Apollo - University of Cambridge Repository
Jazyk:	angličtina
Předmět:	Statistics and Probability FOS: Computer and information sciences Computer Science - Machine Learning Consensus AcademicSubjects/SCI01060 Computer science Information Storage and Retrieval Context (language use) Machine Learning (stat.ML) Machine learning computer.software_genre Biochemistry Statistics - Applications Machine Learning (cs.LG) Methodology (stat.ME) 03 medical and health sciences Kernel (linear algebra) 0302 clinical medicine Robustness (computer science) Statistics - Machine Learning Neoplasms Consensus clustering Cluster Analysis Humans Applications (stat.AP) Cluster analysis Molecular Biology Statistics - Methodology 030304 developmental biology 0303 health sciences Multiple kernel learning business.industry Systems Biology Original Papers Computer Science Applications Computational Mathematics ComputingMethodologies_PATTERNRECOGNITION Computational Theory and Mathematics 030220 oncology & carcinogenesis Kernel (statistics) Benchmark (computing) Artificial intelligence business computer Algorithms
Zdroj:	Bioinformatics
ISSN:	1460-2059 1367-4803
DOI:	10.1093/bioinformatics/btaa593
Popis:	Diverse applications - particularly in tumour subtyping - have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster-Of-Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets, or datasets that define conflicting clustering structures, is unclear. We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. R packages "klic" and "coca" are available on the Comprehensive R Archive Network. Comment: Manuscript: 18 pages, 6 figures. Supplement: 29 pages, 19 figures. This version contains additional simulation studies and comparisons to other methods. For associated R code, see https://CRAN.R-project.org/package=klic and https://github.com/acabassi/klic-pancancer-analysis
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3029d5ab81f201ef524b0ed42abf7658 Zobrazit plný text záznamu