Predicting protein structural classes for low-similarity sequences by evaluating different features
Autor: | Hong-Yan Lai, Wei Chen, Chao-Qin Feng, Xiao-Juan Zhu, Lin Hao |
---|---|
Rok vydání: | 2019 |
Předmět: |
Protein structural class
Information Systems and Management Basis (linear algebra) Computer science business.industry Pattern recognition 02 engineering and technology Function (mathematics) Tripeptide Management Information Systems Similarity (network science) Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Benchmark (computing) 020201 artificial intelligence & image processing Artificial intelligence business Protein secondary structure Software |
Zdroj: | Knowledge-Based Systems. 163:787-793 |
ISSN: | 0950-7051 |
DOI: | 10.1016/j.knosys.2018.10.007 |
Popis: | Protein structural class could provide important clues for understanding protein fold, evolution and function. However, it is still a challenging problem to accurately predict protein structural classes for low-similarity sequences. This paper was devoted to develop a powerful method to predict protein structural classes for low-similarity sequences. On the basis of a very objective and strict benchmark dataset, we firstly extracted optimal tripeptide compositions (OTC) which was picked out by using feature selection technique to formulate protein samples. And an overall accuracy of 91.1% was achieved in jackknife cross-validation. Subsequently, we investigated the accuracies of three popular features: position-specific scoring matrix (PSSM), predicted secondary structure information (PSSI) and the average chemical shift (ACS) for comparison. Finally, to further improve the prediction performance, we examined all combinations of the four kinds of features and achieved the maximum accuracy of 96.7% in jackknife cross-validation by combining OTC with ACS, demonstrating that the model is efficient and powerful. Our study will provide an important guide to extract valuable information from protein sequences. |
Databáze: | OpenAIRE |
Externí odkaz: |