An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features☆

Autor: Pasi K. Korhonen, Neil D. Young, Túlio de Lima Campos, Robin B. Gasser
Jazyk: angličtina
Rok vydání: 2019
Předmět:
PPI
Protein-protein interaction

Eukaryotes
lcsh:Biotechnology
Biophysics
ML
Machine-learning

Biology
Machine learning
computer.software_genre
Biochemistry
Essential genes
03 medical and health sciences
0302 clinical medicine
Protein sequencing
CRISPR
Clustered regularly interspaced short palindromic repeats

Structural Biology
RNA interference
lcsh:TP248.13-248.65
Essentiality prediction
Genetics
CRISPR
ROC-AUC
Area under the receiver operating characteristic curve

NN
Artificial neural network

GO
Gene ontology

Gene
Machine-learning
Gene knockout
030304 developmental biology
Whole genome sequencing
0303 health sciences
Gene knockdown
business.industry
GLM
Generalised linear model

OGEE
Online GEne essentiality database

Computer Science Applications
SPLS
Sparse partial least squares

RNAi
RNA interference

Essential gene
030220 oncology & carcinogenesis
RF
Random Forest

GBM
Gradient boosting method

SVM
Support-Vector machine

Artificial intelligence
GI
Genetic interaction

PR-AUC
Area under the precision-recall curve

business
computer
Biotechnology
Research Article
Zdroj: Computational and Structural Biotechnology Journal
Computational and Structural Biotechnology Journal, Vol 17, Iss, Pp 785-796 (2019)
ISSN: 2001-0370
Popis: The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Graphical Abstract Unlabelled Image
Databáze: OpenAIRE