Insights from analyses of low complexity regions with canonical methods for protein sequence comparison

Autor:	Patryk Jarnot, Joanna Ziemska-Legiecka, Marcin Grynberg, Aleksandra Gruca
Rok vydání:	2022
Předmět:	Sequence Analysis Protein Cluster Analysis Proteins Amino Acid Sequence Amino Acids Molecular Biology Sequence Alignment Algorithms Information Systems
Zdroj:	Briefings in bioinformatics. 23(5)
ISSN:	1477-4054
Popis:	Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::88a3a5917382af58a50004255c0ceada https://pubmed.ncbi.nlm.nih.gov/36341586 Zobrazit plný text záznamu Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.