Improved bacteria population structure analysis on thousands of genomes using unsupervised methods

Autor: David W. Ussery, Scott J. Emrich, Katrina Schlum, Se-Ran Jun, Zulema Udaondo
Jazyk: angličtina
Rok vydání: 2019
Předmět:
DOI: 10.1101/599944
Popis: Over ten thousand genomes ofEscherichia coliare now available, and this number will continue to grow for this and other important microbial species. The first approach often used to better understand microbes is phylogenetic group analysis followed by pan-genome analysis of highly related genomes. Here, we combine sequence-based features with unsupervised clustering on up to 2,231E. coligenomes and a total of 1,367Clostridium difficilegenomes. We show that Non-negative Matrix Factorization (NMF) can identify “mixed”/cryptic genomes, and can better determine inter-related genome groups and their distinguishing features (genes) relative to prior methods.
Databáze: OpenAIRE