Autor: |
Sebastian Schmidt, Shahbaz Khan, Jarno N. Alanko, Giulio E. Pibiri, Alexandru I. Tomescu |
Jazyk: |
angličtina |
Rok vydání: |
2023 |
Předmět: |
|
Zdroj: |
Genome Biology, Vol 24, Iss 1, Pp 1-32 (2023) |
Druh dokumentu: |
article |
ISSN: |
1474-760X |
DOI: |
10.1186/s13059-023-02968-z |
Popis: |
Abstract We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|