Reproducibility of Methods to Detect Differentially Expressed Genes from Single-Cell RNA Sequencing
Autor: | Wenjiang Deng, Fengyun Gu, Tian Mou, Yudi Pawitan, Trung Nghia Vu |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
Wilcoxon signed-rank test lcsh:QH426-470 Computer science genetic processes Computational biology differential expression 03 medical and health sciences 0302 clinical medicine False positive paradox Genetics natural sciences Gene Genetics (clinical) Statistical hypothesis testing Original Research Reproducibility rediscovery rate RNA RNA sequencing single cell lcsh:Genetics 030104 developmental biology Differentially expressed genes Poor control comparison 030220 oncology & carcinogenesis Molecular Medicine |
Zdroj: | Frontiers in Genetics, Vol 10 (2020) Frontiers in Genetics |
ISSN: | 1664-8021 |
DOI: | 10.3389/fgene.2019.01331 |
Popis: | Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell-based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells. |
Databáze: | OpenAIRE |
Externí odkaz: |