RF‐SVM : Identification of DNA ‐binding proteins based on comprehensive feature representation methods and support vector machine
Autor: | Ya Gao, Jianwei Ni, Yanping Zhang |
---|---|
Rok vydání: | 2021 |
Předmět: |
chemistry.chemical_classification
Support Vector Machine Computer science information science DNA replication Computational Biology DNA Computational biology Biochemistry Amino acid Random forest DNA-Binding Proteins Support vector machine Canberra distance chemistry Structural Biology Feature (machine learning) Databases Protein Representation (mathematics) Molecular Biology Pseudo amino acid composition |
Zdroj: | Proteins: Structure, Function, and Bioinformatics. 90:395-404 |
ISSN: | 1097-0134 0887-3585 |
Popis: | Protein-DNA interactions play an important role in biological progress, such as DNA replication, repair, and modification processes. In order to have a better understanding of its functions, the one of the most important steps is the identification of DNA-binding proteins. We propose a DNA-binding protein predictor, namely, RF-SVM, which contains four types features, that is, pseudo amino acid composition (PseAAC), amino acid distribution (AAD), adjacent amino acid composition frequency (ACF) and Local-DPP. Random Forest algorithm is utilized for selecting top 174 features, which are established the predictor model with the support vector machine (SVM) on training dataset UniSwiss-Tr. Finally, RF-SVM method is compared with other existing methods on test dataset UniSwiss-Tst. The experimental results demonstrated that RF-SVM has accuracy of 84.25%. Meanwhile, we discover that the physicochemical properties of amino acids for OOBM770101(H), CIDH920104(H), MIYS990104(H), NISK860101(H), VINM940103(H), and SNEP660101(A) have contribution to predict DNA-binding proteins. The main code and datasets can gain in https://github.com/NiJianWei996/RF-SVM. |
Databáze: | OpenAIRE |
Externí odkaz: |