CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
Autor: | Xiangxiang Zeng, Yun Zuo, Jianyuan Lin, Quan Zou, Xiangrong Liu |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Support Vector Machine
QH301-705.5 Protein Carbonylation 030303 biophysics Computer applications to medicine. Medical informatics R858-859.7 Carbonylation Biochemistry Rotation forest 03 medical and health sciences Structural Biology Resampling Oversampling Biology (General) Molecular Biology 030304 developmental biology Mathematics K-means similarity-based undersampling 0303 health sciences Methodology Article Applied Mathematics Protein post-translational modification k-means clustering Proteins Computer Science Applications Support vector machine The integrated classifier Undersampling False positive rate Protein Processing Post-Translational Algorithm Algorithms |
Zdroj: | BMC Bioinformatics, Vol 22, Iss 1, Pp 1-17 (2021) BMC Bioinformatics |
ISSN: | 1471-2105 |
Popis: | Background Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. Results In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. Conclusion The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/ |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |