A hybrid algorithm for identifying partially conserved regions in multiple sequence alignment.

Autor: Perera, Gamage Kokila Kasuni, Wannige, Champi Thusangi
Předmět:
Zdroj: International Journal of Computers & Applications; Nov 2021, Vol. 43 Issue 10, p979-986, 8p
Abstrakt: Multiple sequence alignment (MSA) algorithms are used to infer homologous regions in DNA and protein sequences which provide the basis for many microbiological studies. Center star method is an MSA algorithm with the ability to address a large-scale dataset, but it tends to produce poor results in the presence of multiple centers in the set of sequences. In such cases, partially conserved regions are often hidden in the alignment. We introduce an algorithm to address this problem based on Center star and progressive methods for MSA. In this algorithm, we first identify the subsets of sequences within the sequences by applying the Bisecting – kmeans algorithm using K-mers as the attributes for clustering. The center star method is performed separately on each subset of sequences. Finally, we merge these alignments by following a progressive alignment approach. An evaluation is carried out by using a set of DNA sequences from some HIV-1 infected patients with a known transmission chain. According to its results, the new algorithm produces output with better sum of pairs scores compared to center star methods and more accurate phylogeny could be generated using the resulting final alignment compared to the center star and progressive methods. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index