Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

Autor:	Dacheng Tao, Lihao Nan, James G. Burchfield, Pengyi Yang, Jean Yee Hwa Yang, Thomas A Geddes, Taiyun Kim
Rok vydání:	2019
Předmět:	Data Analysis Single cells Computer science Random projection lcsh:Computer applications to medicine. Medical informatics Biochemistry 03 medical and health sciences Kernel (linear algebra) 0302 clinical medicine Cluster ensemble Structural Biology scRNA-seq Cluster (physics) Feature (machine learning) Cluster Analysis Humans RNA-Seq Cluster analysis lcsh:QH301-705.5 Molecular Biology 030304 developmental biology 0303 health sciences Artificial neural network Cell type identification Sequence Analysis RNA business.industry Research Applied Mathematics Pattern recognition Autoencoder Computer Science Applications lcsh:Biology (General) Single-cell transcriptome Metric (mathematics) lcsh:R858-859.7 Neural Networks Computer Artificial intelligence Single-Cell Analysis Transcriptome business Algorithms 030217 neurology & neurosurgery Subspace topology
Zdroj:	BMC Bioinformatics BMC Bioinformatics, Vol 20, Iss S19, Pp 1-11 (2019)
ISSN:	1471-2105
DOI:	10.1186/s12859-019-3179-5
Popis:	BackgroundSingle-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.ResultsHere, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets for generating clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metrics used.ConclusionsOur results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/autoencoder_cluster_ensemble
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::650e891e2ac320eaf17869146991c8ca https://doi.org/10.1186/s12859-019-3179-5 Zobrazit plný text záznamu Full text from SpringerLink