PerFSeeB: designing long high-weight single spaced seeds for full sensitivity alignment with a given number of mismatches

Autor: Valeriy Titarenko, Sofya Titarenko
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: BMC Bioinformatics, Vol 24, Iss 1, Pp 1-37 (2023)
Druh dokumentu: article
ISSN: 1471-2105
DOI: 10.1186/s12859-023-05517-4
Popis: Abstract Background Technical progress in computational hardware allows researchers to use new approaches for sequence alignment problems. For a given sequence, we usually use smaller subsequences (anchors) to find possible candidate positions within a reference sequence. We may create pairs (“position”, “subsequence”) for the reference sequence and keep all such records without compression, even on a budget computer. As sequences for new and reference genomes differ, the goal is to find anchors, so we tolerate differences and keep the number of candidate positions with the same anchors to a minimum. Spaced seeds (masks ignoring symbols at specific locations) are a way to approach the task. An ideal (full sensitivity) spaced seed should enable us to find all such positions subject to a given maximum number of mismatches permitted. Results Several algorithms to assist seed generation are presented. The first one finds all permitted spaced seeds iteratively. We observe specific patterns for the seeds of the highest weight. There are often periodic seeds with a simple relation between block size, length of the seed and read. The second algorithm produces blocks for periodic seeds for blocks of up to 50 symbols and up to nine mismatches. The third algorithm uses those lists to find spaced seeds for reads of an arbitrary length. Finally, we apply seeds to a real dataset and compare results for other popular seeds. Conclusions PerFSeeB approach helps to significantly reduce the number of reads’ possible alignment positions for a known number of mismatches. Lists of long, high-weight spaced seeds are available in Additional file 1. The seeds are best in weight compared to seeds from other papers and can usually be applied to shorter reads. Codes for all algorithms and periodic blocks can be found at https://github.com/vtman/PerFSeeB .
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje