Multiple kernel learning for integrative consensus clustering of omic datasets
Autor: | Alessandra Cabassi, Paul D. W. Kirk |
---|---|
Přispěvatelé: | Cabassi, Alessandra [0000-0003-1605-652X], Kirk, Paul [0000-0002-5931-7489], Apollo - University of Cambridge Repository |
Jazyk: | angličtina |
Předmět: |
Statistics and Probability
FOS: Computer and information sciences Computer Science - Machine Learning Consensus AcademicSubjects/SCI01060 Computer science Information Storage and Retrieval Context (language use) Machine Learning (stat.ML) Machine learning computer.software_genre Biochemistry Statistics - Applications Machine Learning (cs.LG) Methodology (stat.ME) 03 medical and health sciences Kernel (linear algebra) 0302 clinical medicine Robustness (computer science) Statistics - Machine Learning Neoplasms Consensus clustering Cluster Analysis Humans Applications (stat.AP) Cluster analysis Molecular Biology Statistics - Methodology 030304 developmental biology 0303 health sciences Multiple kernel learning business.industry Systems Biology Original Papers Computer Science Applications Computational Mathematics ComputingMethodologies_PATTERNRECOGNITION Computational Theory and Mathematics 030220 oncology & carcinogenesis Kernel (statistics) Benchmark (computing) Artificial intelligence business computer Algorithms |
Zdroj: | Bioinformatics |
ISSN: | 1460-2059 1367-4803 |
DOI: | 10.1093/bioinformatics/btaa593 |
Popis: | Diverse applications - particularly in tumour subtyping - have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster-Of-Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets, or datasets that define conflicting clustering structures, is unclear. We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. R packages "klic" and "coca" are available on the Comprehensive R Archive Network. Comment: Manuscript: 18 pages, 6 figures. Supplement: 29 pages, 19 figures. This version contains additional simulation studies and comparisons to other methods. For associated R code, see https://CRAN.R-project.org/package=klic and https://github.com/acabassi/klic-pancancer-analysis |
Databáze: | OpenAIRE |
Externí odkaz: |