Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis

Autor:	Yongchun Zuo, Pengfei Liang, Chunshen Long, Xing Chen, Wuritu Yang, Lei Zheng, Hanshuang Li
Rok vydání:	2020
Předmět:	0301 basic medicine developmental mRNA signature Feature selection Computational biology Cell fate determination Biology Article 03 medical and health sciences feature selection 0302 clinical medicine Drug Discovery differential gene expression Cluster analysis DGE Selection (genetic algorithm) lcsh:RM1-950 prediction single-cell transcriptome Support vector machine machine learning lcsh:Therapeutics. Pharmacology 030104 developmental biology Feature (computer vision) 030220 oncology & carcinogenesis Molecular Medicine F1 score Function (biology)
Zdroj:	Molecular Therapy: Nucleic Acids, Vol 20, Iss, Pp 155-163 (2020) Molecular Therapy. Nucleic Acids
ISSN:	2162-2531
Popis:	Human preimplantation development is a complex process involving dramatic changes in transcriptional architecture. For a better understanding of their time-spatial development, it is indispensable to identify key genes. Although the single-cell RNA sequencing (RNA-seq) techniques could provide detailed clustering signatures, the identification of decisive factors remains difficult. Additionally, it requires high experimental cost and a long experimental period. Thus, it is highly desired to develop computational methods for identifying effective genes of development signature. In this study, we first developed a predictor called EmPredictor to identify developmental stages of human preimplantation embryogenesis. First, we compared the F-score of feature selection algorithms with differential gene expression (DGE) analysis to find specific signatures of the development stage. In addition, by training the support vector machine (SVM), four types of signature subsets were comprehensively discussed. The prediction results showed that a feature subset with 1,881 genes from the F-score algorithm obtained the best predictive performance, which achieved the highest accuracy of 93.3% on the cross-validation set. Further function enrichment demonstrated that the gene set selected by the feature selection method was involved in more development-related pathways and cell fate determination biomarkers. This indicates that the F-score algorithm should be preferentially proposed for detecting key genes of multi-period data in mammalian early development.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d7e055c97bc7ca8ce67fba4631088fda https://doi.org/10.1016/j.omtn.2020.02.004 Zobrazit plný text záznamu