Tree-Based Algorithms for Protein Classification.

Autor: Kacprzyk, Janusz, Kelemen, Arpad, Abraham, Ajith, Chen, Yuehui, Busa-Fekete, Róbert, Kocsor, András, Pongor, Sándor
Zdroj: Computational Intelligence in Bioinformatics; 2008, p165-182, 18p
Abstrakt: The problem of protein sequence classification is one of the crucial tasks in the interpretation of genomic data. Many high-throughput systems were developed with the aim of categorizing the proteins based only on their sequences. However, modelling how the proteins have evolved can also help in the classification task of sequenced data. Hence the phylo-genetic analysis has gained importance in the field of protein classification. This approach does not just rely on the similarities in sequences, but it also considers the phylogenetic information stored in a tree (e.g. in a phylogenetic tree). Eisen used firstly phylogenetic trees in protein classification, and his work has revived the discipline of phylogenomics. In this chapter we provide an overview about this area, and in addition we propose two algorithms that well suited to this scope. We present two algorithms that are based on a weighted binary tree representation of protein similarity data. TreeInsert assigns the class label to the query by determining a minimum cost necessary to insert the query in the (precomputed) trees representing the various classes. Then TreNN assigns the label to the query based on an analysis of the query's neighborhood within a binary tree containing members of the known classes. The algorithms were tested in combination with various sequence similarity scoring methods (BLAST, Smith-Waterman, Local Alignment Kernel as well as various compression-based distance scores) using a large number of classification tasks representing various degrees of difficulty. At the expense of a small computational overhead, both TreeNN and TreeInsert exceed the performance of simple similarity search (1NN) as determined by ROC analysis, at the expense of a modest computational overhead. Combined with a fast tree-building method, both algorithms are suitable for web-based server applications. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index