Assisted transcriptome reconstruction and splicing orthology

Autor:	Anne Bergeron, Paul Guertin, Jean-Stéphane Varré, Samuel Blanquart, Amandine Perrin, Krister M. Swenson
Přispěvatelé:	Bioinformatics and Sequence Analysis (BONSAI), Centre National de la Recherche Scientifique (CNRS)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Laboratoire de combinatoire et d'informatique mathématique [Montréal] (LaCIM), Centre de Recherches Mathématiques [Montréal] (CRM), Université de Montréal (UdeM)-Université de Montréal (UdeM)-Université du Québec à Montréal = University of Québec in Montréal (UQAM), Collège André-Grasset [Montréal], Hub Bioinformatique et Biostatistique - Bioinformatics and Biostatistics HUB, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Génomique évolutive des Microbes / Microbial Evolutionary Genomics, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Institut de Biologie Computationnelle (IBC), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), AB is partially supported by Canada NSERC Grant number 05729-2014, and by Équipes associées Inria-FRQNT Grant number 188128. This research is supported by the Inria Associate Team program.Inria associated team CG-ALCODE (2014-2016), Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Jazyk:	angličtina
Rok vydání:	2016
Předmět:	0301 basic medicine Eukaryotes Proteomics Genome Transcriptome MESH: RNA / metabolism Mice MESH: Proteins / chemistry Protein Isoforms MESH: Animals Splicing orthologs Genetics MESH: Transcriptome MESH: Alternative Splicing MESH: Proteins / metabolism Exons [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] RNA splicing Transcriptome prediction DNA microarray MESH: RNA / chemistry Algorithms MESH: RNA / genetics Biotechnology MESH: Computational Biology Gene isoform lcsh:QH426-470 lcsh:Biotechnology MESH: Protein Isoforms / genetics Computational biology Biology MESH: Protein Isoforms / metabolism 03 medical and health sciences lcsh:TP248.13-248.65 MESH: Proteins / genetics Animals Humans Gene MESH: Mice MESH: Protein Isoforms / chemistry MESH: Humans Alternative splicing Computational Biology Proteins Alternative Splicing lcsh:Genetics 030104 developmental biology MESH: Algorithms RNA MESH: Exons
Zdroj:	BMC Genomics BMC Genomics, BioMed Central, 2016, Proceedings of the 14th Annual Research in Computational Molecular Biology (RECOMB) Comparative Genomics Satellite Workshop: genomics, 17 (Suppl 10), pp.786. ⟨10.1186/s12864-016-3103-6⟩ BMC Genomics, Vol 17, Iss S10, Pp 157-164 (2016) BMC Genomics, 2016, Proceedings of the 14th Annual Research in Computational Molecular Biology (RECOMB) Comparative Genomics Satellite Workshop: genomics, 17 (Suppl 10), pp.786. ⟨10.1186/s12864-016-3103-6⟩
ISSN:	1471-2164
DOI:	10.1186/s12864-016-3103-6⟩
Popis:	International audience; Background: Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve — as has been strongly suggested in the literature — it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs. Results: We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade. Conclusions: Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::07ff4b687c4cc8ff1a3f89934888a731 https://hal.inria.fr/hal-01396410 Zobrazit plný text záznamu Full text from SpringerLink