MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
Autor: | Augusto Cesar Poot-Hernandez, Valeria Souza, Valerie De Anda, Icoquih Zapata-Peñasco, Luis E. Eguiarte, Bruno Contreras Moreira |
---|---|
Přispěvatelé: | Universidad Nacional Autónoma de México, Consejo Nacional de Ciencia y Tecnología (México), World Wildlife Fund, Ministerio de Economía y Competitividad (España), Fundación Agencia Aragonesa para la Investigación y el Desarrollo |
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Kullback–Leibler divergence Pfam domains Microbial Genomes Computer science 030106 microbiology Protein domain Health Informatics Computational biology Biology computer.software_genre Genome 03 medical and health sciences Software sulfur cycle RefSeq Animals Humans Entropy (information theory) omic-datasets multigenomic entropy-based score metagenomics business.industry Research relative entropy Robustness (evolution) Sequence Analysis DNA metabolic machinery Gastrointestinal Microbiome Computer Science Applications 030104 developmental biology Metagenomics Metric (mathematics) Metagenome Data mining business computer Metabolic Networks and Pathways Sulfur |
Zdroj: | Digital.CSIC. Repositorio Institucional del CSIC instname Zaguán. Repositorio Digital de la Universidad de Zaragoza GigaScience |
Popis: | 17 pags.- 7 Figs.- 1 Tabl. © The Authors 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. Valerie De Anda is a doctoral student from Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México (UNAM), and received fellowship 356 832 from Consejo Nacional de Ciencia y Tecnología (CONACYT). This research was also supported by funding from World Wildlife Fund (WWF)-Alianza Carlos Slim, Sep-Ciencia Básica Conacyt grant 238 245 to both Valeria Souza and Luis Enrique Eguiarte and Spanish MINECO grant CSIC13–4E-2490. Bruno Contreras Moreira was funded by Fundación ARAID. The sabbatical leaves of Luis Enrique Eguiarte and Valeria Souza at the University of Minnesota were supported by scholarships from Programa de Apoyos para la Superación del Personal Académico de la UNAM (PASPA), Dirección General de Asuntos del Personal Académico (DGAPA), UNAM. |
Databáze: | OpenAIRE |
Externí odkaz: |