Autor: |
Jessica K. Bonnie, Omar Y. Ahmed, Ben Langmead |
Jazyk: |
angličtina |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
iScience, Vol 27, Iss 3, Pp 109054- (2024) |
Druh dokumentu: |
article |
ISSN: |
2589-0042 |
DOI: |
10.1016/j.isci.2024.109054 |
Popis: |
Summary: Genome assembly databases are growing rapidly. The redundancy of sequence content between a new assembly and previous ones is neither conceptually nor algorithmically easy to measure. We introduce pertinent methods and DandD, a tool addressing how much new sequence is gained when a sequence collection grows. DandD can describe how much structural variation is discovered in each new human genome assembly and when discoveries will level off in the future. DandD uses a measure called δ (“delta”), developed initially for data compression and chiefly dependent on k-mer counts. DandD rapidly estimates δ using genomic sketches. We propose δ as an alternative to k-mer-specific cardinalities when computing the Jaccard coefficient, thereby avoiding the pitfalls of a poor choice of k. We demonstrate the utility of DandD’s functions for estimating δ, characterizing the rate of pangenome growth, and computing all-pairs similarities using k-independent Jaccard. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|