A priori estimation of sequencing effort in complex microbial metatranscriptomes
Autor: | Jorge Frias-Lopez, Toni Monleón-Getino |
---|---|
Rok vydání: | 2020 |
Předmět: |
0106 biological sciences
Computer science Pipeline (computing) Extrapolation Computational biology 010603 evolutionary biology 01 natural sciences Deep sequencing 03 medical and health sciences rarefaction curve Ecology Evolution Behavior and Systematics Original Research 030304 developmental biology Nature and Landscape Conservation metagenomics 0303 health sciences metatranscriptomics sequencing effort Ecology Function (mathematics) Biodiversity simulation Expressió gènica sample size Expression (mathematics) Biodiversitat machine learning Sample size determination Metagenomics NGS A priori and a posteriori RNA Gene expression |
Zdroj: | Dipòsit Digital de la UB Universidad de Barcelona Ecology and Evolution |
Popis: | Metatranscriptome analysis or the analysis of the expression profiles of whole microbial communities has the additional challenge of dealing with a complex system with dozens of different organisms expressing genes simultaneously. An underlying issue for virtually all metatranscriptomic sequencing experiments is how to allocate the limited sequencing budget while guaranteeing that the libraries have sufficient depth to cover the breadth of expression of the community. Estimating the required sequencing depth to effectively sample the target metatranscriptome using RNA‐seq is an essential first step to obtain robust results in subsequent analysis and to avoid overexpansion, once the information contained in the library reaches saturation. Here, we present a method to calculate the sequencing effort using a simulated series of metatranscriptomic/metagenomic matrices. This method is based on an extrapolation rarefaction curve using a Weibull growth model to estimate the maximum number of observed genes as a function of sequencing depth. This approach allowed us to compute the effort at different confidence intervals and to obtain an approximate a priori effort based on an initial fraction of sequences. The analytical pipeline presented here may be successfully used for the in‐depth and time‐effective characterization of complex microbial communities, representing a useful tool for the microbiome research community. New method to calculate the effort in saturation curves and a priori genes prediction using a simulated series of metatranscriptomic/metagenomic matrices. |
Databáze: | OpenAIRE |
Externí odkaz: |