Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species.

Autor: Payá-Milans M; Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States., Olmstead JW; Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States., Nunez G; Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States., Rinehart TA; Thad Cochran Southern Horticultural Laboratory, USDA-Agricultural Research Service, PO Box 287, Poplarville, MS, 39470, United States.; Crop Production and Protection, USDA-Agricultural Research Service, 5601 Sunnyside Ave, Beltsville, MD, 20705, United States., Staton M; Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States.
Jazyk: angličtina
Zdroj: GigaScience [Gigascience] 2018 Dec 01; Vol. 7 (12). Date of Electronic Publication: 2018 Dec 01.
DOI: 10.1093/gigascience/giy132
Abstrakt: Background: The usual analysis of RNA sequencing (RNA-seq) reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-seq analysis workflow on the recovery of useful processed data available for downstream analysis.
Findings: At the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads could be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality, and completeness of de novo assemblies and with number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP.
Conclusions: The selection of bioinformatic software tools for RNA-seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje