Popis: |
Dimensionality reduction is a critical step in the analysis of single-cell RNA-seq data. The standard approach is to apply a transformation to the count matrix, followed by principal components analysis. However, this approach can spuriously indicate heterogeneity where it does not exist and mask true heterogeneity where it does exist. An alternative approach is to directly model the counts, but existing model-based methods tend to be computationally intractable on large datasets and do not quantify uncertainty in the low-dimensional representation. To address these problems, we develop scGBM, a novel method for model-based dimensionality reduction of single-cell RNA-seq data. scGBM employs a scalable algorithm to fit a Poisson bilinear model to datasets with millions of cells and quantifies the uncertainty in each cell’s latent position. Furthermore, scGBM leverages these uncertainties to assess the confidence associated with a given cell clustering. On real and simulated single-cell data, we find that scGBM produces low-dimensional embeddings that better capture relevant biological information while removing unwanted variation. scGBM is publicly available as an R package. |