FM-index of alignment with gaps
Autor: | Thierry Lecroq, Joong Chae Na, Hyunjoon Kim, Kunsoo Park, Laurent Mouchard, Seunghwan Min, M. Léonard, Heejin Park |
---|---|
Přispěvatelé: | Equipe Traitement de l'information en Biologie Santé (TIBS - LITIS), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA), School of Computer Science and Engineering [Seoul] (School of CSE), Seoul National University [Seoul] (SNU), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Université Le Havre Normandie (ULH), Normandie Université (NU), Lecroq, Thierry |
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
0301 basic medicine General Computer Science Computer science [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS] 0102 computer and information sciences [INFO] Computer Science [cs] 01 natural sciences Genome Pattern search Theoretical Computer Science law.invention 03 medical and health sciences law Computer Science - Data Structures and Algorithms Data_FILES Data Structures and Algorithms (cs.DS) [INFO]Computer Science [cs] ComputingMilieux_MISCELLANEOUS Suffix array Siren (codec) 030104 developmental biology Transformation (function) Index (publishing) 010201 computation theory & mathematics Key (cryptography) Algorithm FM-index |
Zdroj: | Theoretical Computer Science Theoretical Computer Science, Elsevier, 2018, 710, pp.148-157. ⟨10.1016/j.tcs.2017.02.020⟩ |
ISSN: | 0304-3975 1879-2294 |
DOI: | 10.1016/j.tcs.2017.02.020 |
Popis: | Recently, a compressed index for similar strings, called the FM-index of alignment (FMA), has been proposed with the functionalities of pattern search and random access. The FMA is quite efficient in space requirement and pattern search time, but it is applicable only for an alignment of similar strings without gaps. In this paper we propose the FM-index of alignment with gaps, a realistic index for similar strings, which allows gaps in their alignment. For this, we design a new version of the suffix array of alignment by using alignment transformation and a new definition of the alignment-suffix. The new suffix array of alignment enables us to support the LF-mapping and backward search, the key functionalities of the FM-index, regardless of gap existence in the alignment. We experimentally compared our index with RLCSA due to Makinen et al. on 100 genome sequences from the 1000 Genomes Project. The index size of our index is less than one third of that of RLCSA. 15pages |
Databáze: | OpenAIRE |
Externí odkaz: |