A Simplified Description of Child Tables for Sequence Similarity Search
Autor: | Anish M S Shrestha, Martin C. Frith |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Theoretical computer science Computer science 0206 medical engineering 02 engineering and technology Table (information) law.invention 03 medical and health sciences law Genetics Generality Models Statistical Applied Mathematics Suffix array LCP array Computational Biology Data structure 030104 developmental biology Index (publishing) Task analysis Sequence Alignment Sequence Analysis 020602 bioinformatics Algorithms Software Biotechnology Reference genome |
Zdroj: | IEEE/ACM transactions on computational biology and bioinformatics. 15(6) |
ISSN: | 1557-9964 |
Popis: | Finding related nucleotide or protein sequences is a fundamental, diverse, and incompletely-solved problem in bioinformatics. It is often tackled by seed-and-extend methods, which first find “seed” matches of diverse types, such as spaced seeds, subset seeds, or minimizers. Seeds are usually found using an index of the reference sequence(s), which stores seed positions in a suffix array or related data structure. A child table is a fundamental way to achieve fast lookup in an index, but previous descriptions have been overly complex. This paper aims to provide a more accessible description of child tables, and demonstrate their generality: they apply equally to all the above-mentioned seed types and more. We also show that child tables can be used without LCP (longest common prefix) tables, reducing the memory requirement. |
Databáze: | OpenAIRE |
Externí odkaz: |