Improved detection of disease-associated gut microbes using 16S sequence-based biomarkers.
Autor: | Chrisman BS; Department of Bioengineering, Stanford University, Serra Mall, Stanford, USA. briannac@stanford.edu., Paskov KM; Department of Biomedical Data Science, Stanford University, Serra Mall, Stanford, USA., Stockham N; Department of Neuroscience, Stanford University, Serra Mall, Stanford, USA., Jung JY; Department of Biomedical Data Science, Stanford University, Serra Mall, Stanford, USA., Varma M; Department of Computer Science, Stanford University, Serra Mall, Stanford, USA., Washington PY; Department of Bioengineering, Stanford University, Serra Mall, Stanford, USA., Tataru C; Department of Computer Science, Oregon State University, SW Campus Way, Corvallis, USA., Iwai S; Second Genome Inc, Allerton Ave, Brisbane, USA., DeSantis TZ; Second Genome Inc, Allerton Ave, Brisbane, USA., David M; Department of Microbiology, Oregon State University, SW Campus Way, Corvallis, USA., Wall DP; Department of Biomedical Data Science, Stanford University, Serra Mall, Stanford, USA.; Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Road, Stanford, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | BMC bioinformatics [BMC Bioinformatics] 2021 Oct 19; Vol. 22 (1), pp. 509. Date of Electronic Publication: 2021 Oct 19. |
DOI: | 10.1186/s12859-021-04427-7 |
Abstrakt: | Background: Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. Results: On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. Conclusions: SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers . (© 2021. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: |