Classifying protein fragment of unknown structure base on amino acid sequence pattern

Autor: Chin-Yu Ko, 柯清瑀
Rok vydání: 2005
Druh dokumentu: 學位論文 ; thesis
Popis: 93
Protein structure prediction plays a major role in clinical research of biomedical science in the 21st century. In the past, protein structures were obtained by X-ray diffraction or nuclear magnetic resonance but both methods have technical limitation. If a fragmented protein sequence can be correctly matched to its structure, it would be more effective to infer the unknown functions of a protein with less expense and numerous problems in biomedical area may be resolved naturally. The major difficulty in developing the method of protein structure prediction lies in the selection of the protein backbone template especially when there is no homology protein to refer to, which leads to the deviation of backbone structure even though it is a local optimal structure. In this study, we proposed a new method to classify protein fragment, which not only discovered the possible structures of each protein fragment but also opened up varieties of possibilities to predict the whole protein structure. The primary idea of the proposed method is based on pattern mining of protein fragment sequences. It is motivated by the observation that there exist a finite number of specific sequence patterns in each class of protein fragments and these patterns may imply not only sequence information but possible molecular interaction. Once we found out these patterns, we could assign appropriate class to each fragment of protein and match fragments to the possible structures. If two proteins are similar in structures, it does not imply that their sequences be similar as well but using the classes characterizing different protein fragment structures to annotate that two protein sequences should be similar. Theoretically, it can reduce the structural deviation caused by slight sequence difference. Recognizing the potential drawbacks of depending on the existence of homologous proteins commonly found in conventional secondary and tertiary protein structure predictions, in the proposed method, we deliberately dropped the step of finding homologous proteins but still kept the concepts of homology modeling. The prediction accuracy in test data is about 78% and in whole sequence cases is more than 77%.
Databáze: Networked Digital Library of Theses & Dissertations