Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
Autor: | Zulema Udaondo, Trudy M. Wassenaar, Visanu Wanchai, Kaleb Z. Abram, Carissa Bleker, David W. Ussery, Michael S. Robeson |
---|---|
Rok vydání: | 2021 |
Předmět: |
QH301-705.5
Genetic Speciation Classification and taxonomy Medicine (miscellaneous) Biology Genome informatics medicine.disease_cause Genome Article General Biochemistry Genetics and Molecular Biology 03 medical and health sciences Gene duplication Escherichia coli medicine Shigella Biology (General) Gene Phylogeny 030304 developmental biology Genetics 0303 health sciences Phylogenetic tree 030306 microbiology Escherichia coli Proteins Computational Biology Genomics Sequence Analysis DNA Single copy GenBank General Agricultural and Biological Sciences Genome Bacterial |
Zdroj: | Communications Biology Communications Biology, Vol 4, Iss 1, Pp 1-12 (2021) |
ISSN: | 2399-3642 |
Popis: | In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species. Kaleb Abram and Zulema Udaondo et al. analyze over 100,000 publicly available E. coli and Shigella genome sequences and perform a Mash-based analysis to identify 14 unique phylogroups. Their results reveal that most of the sequenced E. coli genomes belong to four distinct phylogroups. |
Databáze: | OpenAIRE |
Externí odkaz: |