Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
Autor: | Stefan Canzar, Francisca Rojas Ringeling, Van Hoan Do |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
Method Scale (descriptive set theory) Biology 03 medical and health sciences 0302 clinical medicine Genetics Cluster Analysis RNA-Seq Time complexity Genetics (clinical) 030304 developmental biology 0303 health sciences business.industry Gene Expression Profiling Contrast (statistics) Pattern recognition Sparse approximation Spectral clustering Expression (mathematics) Identification (information) Scalability Embedding Artificial intelligence Single-Cell Analysis business 030217 neurology & neurosurgery Algorithms |
Zdroj: | Genome Res |
DOI: | 10.1101/2020.06.15.151910 |
Popis: | A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose methodSpecterthat adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of thefulldata from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available athttps://github.com/canzarlab/Specter. |
Databáze: | OpenAIRE |
Externí odkaz: |