Autor:	Marchet, Camille
Rok vydání:	2024
Předmět:	Quantitative Biology - Genomics
Druh dokumentu:	Working Paper
Popis:	This paper provides a comprehensive survey of data structures for representing k-mer sets, which are fundamental in high-throughput sequencing analysis. It categorizes the methods into two main strategies: those using fingerprinting and hashing for compact storage, and those leveraging lexicographic properties for efficient representation. The paper reviews key operations supported by these structures, such as membership queries and dynamic updates, and highlights recent advancements in memory efficiency and query speed. A companion paper explores colored k-mer sets, which extend these concepts to integrate multiple datasets or genomes.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2409.05210 Zobrazit plný text záznamu View this record from Arxiv