Autor: |
Fotios Kounelis, Christos Makris |
Jazyk: |
angličtina |
Rok vydání: |
2017 |
Předmět: |
|
Zdroj: |
AIMS Medical Science, Vol 4, Iss 4, Pp 426-440 (2017) |
Druh dokumentu: |
article |
ISSN: |
2375-1576 |
DOI: |
10.3934/medsci.2017.4.426 |
Popis: |
A significant problem in computer science is the management of large data strings and a great number of works dealing with the specific problem has been published in the scientific literature. In this article, we use a technique to store efficiently biological sequences, making use of data structures like suffix trees and inverted files and also employing techniques like n-grams, in order to improve previous constructions. In our attempt, we drastically reduce the space needed to store the inverted indexes, by representing the substrings that appear more frequently in a more compact inverted index. Our technique is based on n-gram indexing, providing us the extra advantage of indexing sequences that cannot be separated in words. Moreover, our technique combines classical one level with two-level n-gram inverted file indexing. Our results suggest that the new proposed algorithm can compress the data more efficiently than previous attempts. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|