PiSA-BLAST: A New Tool for Protein Structure Alignment and Database Search

Autor: Chi-hua Tung, 董其樺
Rok vydání: 2005
Druh dokumentu: 學位論文 ; thesis
Popis: 93
The structural database searching has become increasingly important with growing numbers of known protein structures. This increase was near exponential in the early 1990s and has become linear over the past several years. As more and more the availability of the growing number of protein crystal structures, the demand for a very fast and accurate method to searching for structures similar to a query structure is high. In this thesis, we have developed a new tool, termed PiSA-BLAST for protein structure database search that does not require the alignment of two 3D structures. Here we have developed a new method for the protein structure alignment by transforming 3D structures into 1D sequences. This method use the information of kappa and alpha angles, derived from DSSP program, to represent the protein 3D structure. Based on the segment information and clustering method, we transform the structural information with kappa and alpha angles into coded regions. After that, each protein with 3D structure is able to transfer into 1D sequence and we could develop a new substitution matrix that can be used as the scoring matrix of sequence alignment for 23 new codes. These encoded sequences are collected as a structure database. Launching BLAST, a well-known sequence alignment tool, to search structure database in a short time and we will get a list of proteins that are similar in structure. We evaluated PiSA-BLAST on five diverse data sets from SCOP and protein data bank. For the dataset SCOP 95 with 108 queries on 9,354 protein domains, the average precisions of PiSA-BLAST and CE are 78.2% and 82.1%, respectively, and the total executing times are 34 seconds for PiSA-BLAST and about 1,169,832 seconds for CE. The average precision is 69.8% and time is 18.3 seconds for PSI-BLAST. Based on these experiments, we summarized several observations: (1) PiSA-BLAST is as fast as BLAST for protein structure database search and is 34,000 times faster than CE on the database SCOP 95. (2) The accuracy of PiSA-BLAST closes the accuracy of CE and much better than BLAST and PSI-BLAST which are based on amino-acid sequences. These results imply that our structural new codes and substitute matrix are useful for protein structure alignment. (3) PiSA-BLAST is able to provide a significant e-value with e-15 for structure database search as the e-value with e-3 in BLAST for sequence database search. PiSA-BLAST achieved about 90% accuracy for a query when e-value is less than e-15. (4) PiSA-BLAST is a useful filtering tool before performing a detailed database search, such as CE and DALI. (5) PiSA-BLAST is able to provide real-time web services for protein structure database search as BLAST in protein sequence search. We believe that this issue is important for structural genomics and proteomics.
Databáze: Networked Digital Library of Theses & Dissertations