A workflow with dedicated tools for preparing reference transcriptomes from non-model organisms has evidenced important biological information
Autor: | Claros-Diaz, Manuel Gonzalo, Benzekri, Hicham, Seoane, Pedro, Carmona, Rosario, Bautista, Rocío, Guerrero-Fernández, Darío, Fernández-Pozo, Noé |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | RIUMA. Repositorio Institucional de la Universidad de Málaga instname |
Popis: | Construction transcriptomes of non-model organisms is a common task nowadays due to the advent of low-cost sequencing platforms. The non-model organisms require a de novo assembling strategy since no reference is available. Moreover, DNA or RNA from non-model species usually comes from natural, highly heterozygous populations, providing an additional complexity to the assembly process. To facilitate the otherwise cumbersome task of obtaining the better transcriptome, here it is presented an automatic, reliable pipeline that uses public and newly designed tools to do so, handling both short and long reads. It starts with our pre-processing software SeqTrimNext that extracts reliable reads and removes any uncertain sequence that could obscure the final result. Two different algorithms (usually MRA3 and Euler-SR, or Oases) are used to provide different sets of contigs that are then simplified using CD-HIT. Mapping with Bowtie2 is used to discard artefactual contigs. Reliable contigs are analysed with our software Full-LengtherNext to discard non-coding sequences, split chimeras, detect transcripts containing complete proteins, provide an overview of the transcriptome, and sort putative new or species-specific transcripts. Other valuable features of Full-LengtherNext are the selection of the closest orthologue from a model species, and the extraction of a reference transcriptome that can be further used for RNA-seq studies. Finally, descriptions, GO, InterPro, KEGG and EC codes are added using Sma3 and AutoFact, and a set of microsatellite markers is obtained with MREPS. The structure of the pipeline has been automatised with AutFlow, a framework developed in our laboratory to automatise repetitive and long tasks, enabling taking decisions (such as which is the better assembly) during the execution without human intervention. This strategy has been already used for assembling several transcriptomes and provide functional characterisation of pollinic tube genes (olive tree), genes related to eye development (sole), resistance to bight (bean) and the comparison of gene family sizes (pine). Moreover, lot of genes have been cloned in pine, sole and olive tree based on the sequences revealed by our pipeline. Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. |
Databáze: | OpenAIRE |
Externí odkaz: |