An alignment- and reference-free strategy using k-mer present pattern for population genomic analyses

Autor: Guohui Shi, Yi Dai, Da Zhou, Mengmeng Chen, Jiaqi Zhang, Yilong Bi, Shuai Liu, Qi Wu
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Mycology, Pp 1-15 (2024)
Druh dokumentu: article
ISSN: 21501203
2150-1211
2150-1203
DOI: 10.1080/21501203.2024.2358868
Popis: Pangenomes are replacing single reference genomes to capture all variants within a species or clade, but their analysis predominantly leverages graph-based methods that require multiple high-quality genomes and computationally intensive multiple-genome alignments. K-mer decomposition is an alternative to graph-based pangenomes. However, how to directly use k-mers for the population genetic analyses is unknown. Here, we developed a novel strategy that uses the variants of k-mer count in the genome for population analyses. To test the effectivity of this method, we compared it directly to the SNP-based method on the analysis of population structure and genetic diversity of 267 Saccharomyces cerevisiae strains within two simulated datasets and a real sequence dataset. The population structure identified with k-mers recapitulates that obtained using SNPs, indicating the effectiveness of k-mer-based approach, and higher genetic diversity within real dataset supported k-mers contained more genetic variants. Based on k-mer frequency, we found not only SNP but also some insertion/deletion and horizontal gene transfer (HGT) fragments related to the adaptive evolution of S. cerevisiae. Our study creates a framework for the alignment- and reference-free (ARF) method in population genetic analyses, which will be more pronounced in the species with no complete genome or highly diverged species.
Databáze: Directory of Open Access Journals