Prediction of Protein–ATP Binding Residues Based on Ensemble of Deep Convolutional Neural Networks and LightGBM Algorithm

Autor: Jiazhi Song, Guixia Liu, Yanchun Liang, Jingqing Jiang, Ping Zhang
Jazyk: angličtina
Rok vydání: 2021
Předmět:
0301 basic medicine
Support Vector Machine
Computer science
protein–ATP binding residue prediction
02 engineering and technology
Convolutional neural network
Catalysis
Article
LightGBM
Inorganic Chemistry
lcsh:Chemistry
Machine Learning
03 medical and health sciences
Adenosine Triphosphate
deep convolutional neural network
Prediction methods
0202 electrical engineering
electronic engineering
information engineering

Humans
Amino Acid Sequence
Physical and Theoretical Chemistry
Molecular Biology
lcsh:QH301-705.5
Spectroscopy
Organic Chemistry
Computational Biology
Proteins
General Medicine
Matthews correlation coefficient
Ensemble learning
Computer Science Applications
Random forest
Support vector machine
030104 developmental biology
lcsh:Biology (General)
lcsh:QD1-999
Weight distribution
Benchmark (computing)
ensemble learning
020201 artificial intelligence & image processing
Neural Networks
Computer

protein primary sequence
Carrier Proteins
Algorithm
Algorithms
Protein Binding
Zdroj: International Journal of Molecular Sciences
Volume 22
Issue 2
International Journal of Molecular Sciences, Vol 22, Iss 939, p 939 (2021)
ISSN: 1422-0067
DOI: 10.3390/ijms22020939
Popis: Accurately identifying protein&ndash
ATP binding residues is important for protein function annotation and drug design. Previous studies have used classic machine-learning algorithms like support vector machine (SVM) and random forest to predict protein&ndash
ATP binding residues
however, as new machine-learning techniques are being developed, the prediction performance could be further improved. In this paper, an ensemble predictor that combines deep convolutional neural network and LightGBM with ensemble learning algorithm is proposed. Three subclassifiers have been developed, including a multi-incepResNet-based predictor, a multi-Xception-based predictor, and a LightGBM predictor. The final prediction result is the combination of outputs from three subclassifiers with optimized weight distribution. We examined the performance of our proposed predictor using two datasets: a classic ATP-binding benchmark dataset and a newly proposed ATP-binding dataset. Our predictor achieved area under the curve (AUC) values of 0.925 and 0.902 and Matthews Correlation Coefficient (MCC) values of 0.639 and 0.642, respectively, which are both better than other state-of-art prediction methods.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje