PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features

Autor: Mehedi Hasan, Watshara Shoombuatong, Hiroyuki Kurata, Firda Nurul Auliah, Andi Nur Nilamyani, Mohammad Ali Moni
Jazyk: angličtina
Rok vydání: 2021
Předmět:
0301 basic medicine
Support Vector Machine
Computer science
Computational resource
Catalysis
Cross-validation
Article
Inorganic Chemistry
lcsh:Chemistry
03 medical and health sciences
chemistry.chemical_compound
0302 clinical medicine
Molecular function
Sequence Analysis
Protein

Feature (machine learning)
Physical and Theoretical Chemistry
Molecular Biology
lcsh:QH301-705.5
Spectroscopy
RFE feature selection
Sequence
nitrotyrosine
business.industry
Nitrotyrosine
Organic Chemistry
Computational Biology
Proteins
Pattern recognition
General Medicine
Computer Science Applications
Random forest
Identification (information)
030104 developmental biology
machine learning
chemistry
post-translational modification
lcsh:Biology (General)
lcsh:QD1-999
030220 oncology & carcinogenesis
Tyrosine
Artificial intelligence
feature encoding
business
Protein Processing
Post-Translational
Zdroj: International Journal of Molecular Sciences, Vol 22, Iss 2704, p 2704 (2021)
International Journal of Molecular Sciences
Volume 22
Issue 5
ISSN: 1661-6596
1422-0067
Popis: Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.
Databáze: OpenAIRE