Sequence representation and prediction of protein secondary structure for structural motifs in twilight zone proteins
Autor: | Kanaka Durga Kedarisetti, Lukasz Kurgan |
---|---|
Rok vydání: | 2006 |
Předmět: |
Sequence analysis
Structural alignment Amino Acid Motifs Bioengineering Sequence alignment Biology Bioinformatics Biochemistry Protein Structure Secondary Analytical Chemistry Defensins Protein structure Sequence Analysis Protein Amino Acid Sequence Structural motif Databases Protein Protein secondary structure Multiple sequence alignment Artificial neural network business.industry Organic Chemistry Decision Trees Proteins Pattern recognition Artificial intelligence Neural Networks Computer business Sequence Alignment Algorithms |
Zdroj: | The protein journal. 25(7-8) |
ISSN: | 1572-3887 |
Popis: | Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms. |
Databáze: | OpenAIRE |
Externí odkaz: |