An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features☆
Autor: | Pasi K. Korhonen, Neil D. Young, Túlio de Lima Campos, Robin B. Gasser |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
PPI
Protein-protein interaction Eukaryotes lcsh:Biotechnology Biophysics ML Machine-learning Biology Machine learning computer.software_genre Biochemistry Essential genes 03 medical and health sciences 0302 clinical medicine Protein sequencing CRISPR Clustered regularly interspaced short palindromic repeats Structural Biology RNA interference lcsh:TP248.13-248.65 Essentiality prediction Genetics CRISPR ROC-AUC Area under the receiver operating characteristic curve NN Artificial neural network GO Gene ontology Gene Machine-learning Gene knockout 030304 developmental biology Whole genome sequencing 0303 health sciences Gene knockdown business.industry GLM Generalised linear model OGEE Online GEne essentiality database Computer Science Applications SPLS Sparse partial least squares RNAi RNA interference Essential gene 030220 oncology & carcinogenesis RF Random Forest GBM Gradient boosting method SVM Support-Vector machine Artificial intelligence GI Genetic interaction PR-AUC Area under the precision-recall curve business computer Biotechnology Research Article |
Zdroj: | Computational and Structural Biotechnology Journal Computational and Structural Biotechnology Journal, Vol 17, Iss, Pp 785-796 (2019) |
ISSN: | 2001-0370 |
Popis: | The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches. Graphical Abstract Unlabelled Image |
Databáze: | OpenAIRE |
Externí odkaz: |