Classifier ensembles for protein structural class prediction with varying homology

Autor: Scott Dick, Kanaka Durga Kedarisetti, Lukasz Kurgan
Rok vydání: 2006
Předmět:
Zdroj: Biochemical and biophysical research communications. 348(3)
ISSN: 0006-291X
Popis: Structural class characterizes the overall folding type of a protein or its domain. A number of computational methods have been proposed to predict structural class based on primary sequences; however, the accuracy of these methods is strongly affected by sequence homology. This paper proposes, an ensemble classification method and a compact feature-based sequence representation. This method improves prediction accuracy for the four main structural classes compared to competing methods, and provides highly accurate predictions for sequences of widely varying homologies. The experimental evaluation of the proposed method shows superior results across sequences that are characterized by entire homology spectrum, ranging from 25% to 90% homology. The error rates were reduced by over 20% when compared with using individual prediction methods and most commonly used composition vector representation of protein sequences. Comparisons with competing methods on three large benchmark datasets consistently show the superiority of the proposed method.
Databáze: OpenAIRE