Sequence Alignment with Block Constraint

Autor: Tsai, Ping Han, 蔡秉翰
Rok vydání: 2016
Druh dokumentu: 學位論文 ; thesis
Popis: 104
In order to determine whether two sequences are similar or not, we usually do the pairwise alignment. In bioinformatics, sequence alignment is an important strategy to determine the identity between two DNA, RNA, or protein sequences. The sequence alignment can identify the similar regions that may share similar structure, function or evolutionary relationship. Compared with the 20-letter protein alphabet, the 4-letter RNA alphabet is smaller and less informative. As a consequence, when the identity between two RNA sequences is under 60%, it is hard to determine whether these two RNA sequences have the similar struc-ture. Thus, to align two RNA molecules, several studies have considered not merely sequence information, but also secondary or tertiary structure infor-mation. Our lab developed a tool called iPARTS2 in 2016 that aligns two RNA 3D structures based on both primary and tertiary structure information. The basic steps of our iPARTS2 are as follows. First, a Ramachandran-like diagram of RNAs was derived by plotting nucleotides of RNA structures in the PDB da-tabase on a 2D axis using their two pseudo-torsion angles η and θ. Then, affinity propagation clustering algorithm was applied to the η-θ plot to obtain 23 nucle-otide conformations, which were combined with RNA 1D sequence information A, U, C and G to further obtain a structural alphabet (SA) of 92 elements. Next, the SA was used to transform RNA 3D structures into 1D sequences of SA let-ters. Finally, classical sequence alignment methods were utilized on two SA-encoded sequences to determine their structural similarities. However, given two RNA molecules
Databáze: Networked Digital Library of Theses & Dissertations