Analysis of a Single Cell RNA-seq Workflow by Random Matrix Theory Methods.

Autor: Leviyang S; Department of Mathematics and Statistics, Georgetown University, Washington, 20057, DC, USA. Sivan.Leviyang@georgetown.edu.
Jazyk: angličtina
Zdroj: Bulletin of mathematical biology [Bull Math Biol] 2024 Nov 25; Vol. 87 (1), pp. 4. Date of Electronic Publication: 2024 Nov 25.
DOI: 10.1007/s11538-024-01376-z
Abstrakt: Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.
Competing Interests: Declarations. Conflict of interest: The author declares no Conflict of interest.
(© 2024. The Author(s), under exclusive licence to the Society for Mathematical Biology.)
Databáze: MEDLINE