Using pseudoalignment and base quality to accurately quantify microbial community composition

Autor: Reppell, M., Novembre, J.
Jazyk: angličtina
Rok vydání: 2018
Předmět:
0301 basic medicine
Computer science
computer.software_genre
Database and Informatics Methods
Human health
0302 clinical medicine
Software
RNA
Ribosomal
16S

Databases
Genetic

Profiling (information science)
lcsh:QH301-705.5
Data Management
Base Composition
0303 health sciences
Ecology
Applied Mathematics
Simulation and Modeling
Microbiota
Genomics
Computational Theory and Mathematics
Medical Microbiology
Modeling and Simulation
Physical Sciences
Data mining
Sequence Analysis
Algorithms
Research Article
Microbial Taxonomy
Computer and Information Sciences
Multiple Alignment Calculation
Bioinformatics
Microbial Consortia
Quantitative Trait Loci
Sequencing data
Sequence Databases
Microbial Genomics
Quantitative trait locus
Research and Analysis Methods
Microbiology
Cellular and Molecular Neuroscience
03 medical and health sciences
Computational Techniques
Genetics
Humans
Computer Simulation
Molecular Biology
Ecology
Evolution
Behavior and Systematics

Taxonomy
030304 developmental biology
business.industry
030306 microbiology
Biology and Life Sciences
Computational Biology
DNA
Biological classification
16S ribosomal RNA
Split-Decomposition Method
Biological Databases
030104 developmental biology
Microbial population biology
lcsh:Biology (General)
Reference database
Microbiome
Pooled dna
business
Scale (map)
Sequence Alignment
computer
Mathematics
030217 neurology & neurosurgery
Zdroj: PLoS Computational Biology, Vol 14, Iss 4, p e1006096 (2018)
PLoS Computational Biology
ISSN: 1553-7358
Popis: Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.
Author summary Pooled DNA from multiple unknown organisms arises in a variety of contexts. Determining the composition of pooled samples can be difficult, especially at the scale of modern data. Here we propose the novel method Karp, designed to perform taxonomic profiling in pooled DNA. Karp combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply Karp to the problem of classifying 16S rRNA reads using a reference database of known organisms. Using simulations, we show Karp is accurate across a variety of read lengths, reference sequence lengths, sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show that relative to other widely used quantification methods Karp reveals a larger number of microbiome quantitative trait association signals. Modern sequencing technology gives us unprecedented access to microbial communities, but uncovering significant findings requires correctly interpreting pooled microbial DNA. Karp provides a novel combination of speed and accuracy that makes it uniquely suited for enhancing discoveries in microbial studies.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje