A Simplified Description of Child Tables for Sequence Similarity Search

Autor:	Anish M S Shrestha, Martin C. Frith
Rok vydání:	2018
Předmět:	0301 basic medicine Theoretical computer science Computer science 0206 medical engineering 02 engineering and technology Table (information) law.invention 03 medical and health sciences law Genetics Generality Models Statistical Applied Mathematics Suffix array LCP array Computational Biology Data structure 030104 developmental biology Index (publishing) Task analysis Sequence Alignment Sequence Analysis 020602 bioinformatics Algorithms Software Biotechnology Reference genome
Zdroj:	IEEE/ACM transactions on computational biology and bioinformatics. 15(6)
ISSN:	1557-9964
Popis:	Finding related nucleotide or protein sequences is a fundamental, diverse, and incompletely-solved problem in bioinformatics. It is often tackled by seed-and-extend methods, which first find “seed” matches of diverse types, such as spaced seeds, subset seeds, or minimizers. Seeds are usually found using an index of the reference sequence(s), which stores seed positions in a suffix array or related data structure. A child table is a fundamental way to achieve fast lookup in an index, but previous descriptions have been overly complex. This paper aims to provide a more accessible description of child tables, and demonstrate their generality: they apply equally to all the above-mentioned seed types and more. We also show that child tables can be used without LCP (longest common prefix) tables, reducing the memory requirement.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b0a9e92e021f7f2208a938c5c67d147d https://pubmed.ncbi.nlm.nih.gov/29994365 Zobrazit plný text záznamu