The impact of package selection and versioning on single-cell RNA-seq analysis.

Autor: Rich JM; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.; USC-Caltech MD/PhD Program, Keck School of Medicine, Los Angeles, CA, 90033, USA., Moses L; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA., Einarsson PH; Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, Reykjavík, Iceland., Jackson K; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.; USC-Caltech MD/PhD Program, Keck School of Medicine, Los Angeles, CA, 90033, USA., Luebbert L; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA., Booeshaghi AS; Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA., Antonsson S; Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, Reykjavík, Iceland., Sullivan DK; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.; UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA., Bray N; Boston, MA., Melsted P; Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, Reykjavík, Iceland., Pachter L; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.; Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA.; Lead Contact.
Jazyk: angličtina
Zdroj: BioRxiv : the preprint server for biology [bioRxiv] 2024 Apr 11. Date of Electronic Publication: 2024 Apr 11.
DOI: 10.1101/2024.04.04.588111
Abstrakt: Standard single-cell RNA-sequencing analysis (scRNA-seq) workflows consist of converting raw read data into cell-gene count matrices through sequence alignment, followed by analyses including filtering, highly variable gene selection, dimensionality reduction, clustering, and differential expression analysis. Seurat and Scanpy are the most widely-used packages implementing such workflows, and are generally thought to implement individual steps similarly. We investigate in detail the algorithms and methods underlying Seurat and Scanpy and find that there are, in fact, considerable differences in the outputs of Seurat and Scanpy. The extent of differences between the programs is approximately equivalent to the variability that would be introduced in benchmarking scRNA-seq datasets by sequencing less than 5% of the reads or analyzing less than 20% of the cell population. Additionally, distinct versions of Seurat and Scanpy can produce very different results, especially during parts of differential expression analysis. Our analysis highlights the need for users of scRNA-seq to carefully assess the tools on which they rely, and the importance of developers of scientific software to prioritize transparency, consistency, and reproducibility for their tools.
Competing Interests: Declaration of Interests The authors declare no competing interests.
Databáze: MEDLINE