A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
Autor: | Tanja Woyke, Daniel Barich, Joan L. Slonczewski, Rob Egan, Volkan Sevim, Dongwan D. Kang, Derek N. Macklin, Rachael M. Morgan-Kiss, Jeff Froula, Harrison Ho, Frederik Schulz, Wei Li, Zhong Wang, Kayla McCue, Shijie Yao, Rachel Orsini, Christopher J. Sedlacek, Jackie E. Shay |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
0303 health sciences
Phylogenetic tree 030306 microbiology Human Genome 15. Life on land Biology Genome Visualization 03 medical and health sciences Taxon Networking and Information Technology R&D (NITRD) Phylogenetics Metagenomics Evolutionary biology Genetics Giant Virus Cluster analysis 030304 developmental biology |
DOI: | 10.1101/812917 |
Popis: | Classifying taxa, including those that have not previously been identified, is a key task in characterizing the microbial communities of under-described habitats, including permanently ice-covered lakes in the dry valleys of the Antarctic. Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from such habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, 99Genome Constellation99, that is capable of rapidly characterizing a large number of metagenome-assembled genomes. Genome Constellation estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40\% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. The clustering-based analysis revealed several novel taxa groups, including six clusters that may represent new bacterial phyla. Remarkably, we discovered 63 new giant viruses, 3 of which could not be found by using the traditional marker-based approach. In summary, we demonstrate that Genome Constellation provides an unbiased option to rapidly analyze a large number of microbial genomes and visually explore their relatedness. The software is available under BSD license at: https://bitbucket.org/berkeleylab/jgi-genomeconstellation/. |
Databáze: | OpenAIRE |
Externí odkaz: |