PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features
Autor: | Mehedi Hasan, Watshara Shoombuatong, Hiroyuki Kurata, Firda Nurul Auliah, Andi Nur Nilamyani, Mohammad Ali Moni |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
0301 basic medicine
Support Vector Machine Computer science Computational resource Catalysis Cross-validation Article Inorganic Chemistry lcsh:Chemistry 03 medical and health sciences chemistry.chemical_compound 0302 clinical medicine Molecular function Sequence Analysis Protein Feature (machine learning) Physical and Theoretical Chemistry Molecular Biology lcsh:QH301-705.5 Spectroscopy RFE feature selection Sequence nitrotyrosine business.industry Nitrotyrosine Organic Chemistry Computational Biology Proteins Pattern recognition General Medicine Computer Science Applications Random forest Identification (information) 030104 developmental biology machine learning chemistry post-translational modification lcsh:Biology (General) lcsh:QD1-999 030220 oncology & carcinogenesis Tyrosine Artificial intelligence feature encoding business Protein Processing Post-Translational |
Zdroj: | International Journal of Molecular Sciences, Vol 22, Iss 2704, p 2704 (2021) International Journal of Molecular Sciences Volume 22 Issue 5 |
ISSN: | 1661-6596 1422-0067 |
Popis: | Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available. |
Databáze: | OpenAIRE |
Externí odkaz: |