A Simplified Description of Child Tables for Sequence Similarity Search

Autor: Anish M S Shrestha, Martin C. Frith
Rok vydání: 2018
Předmět:
Zdroj: IEEE/ACM transactions on computational biology and bioinformatics. 15(6)
ISSN: 1557-9964
Popis: Finding related nucleotide or protein sequences is a fundamental, diverse, and incompletely-solved problem in bioinformatics. It is often tackled by seed-and-extend methods, which first find “seed” matches of diverse types, such as spaced seeds, subset seeds, or minimizers. Seeds are usually found using an index of the reference sequence(s), which stores seed positions in a suffix array or related data structure. A child table is a fundamental way to achieve fast lookup in an index, but previous descriptions have been overly complex. This paper aims to provide a more accessible description of child tables, and demonstrate their generality: they apply equally to all the above-mentioned seed types and more. We also show that child tables can be used without LCP (longest common prefix) tables, reducing the memory requirement.
Databáze: OpenAIRE