Filling annotation gaps in yeast genomes using genome-wide contact maps

Autor: Gilles Fischer, Martial Marbouty, Gianni Liti, Christophe Zimmer, Hervé Marie-Nelly, Axel Cournac, Romain Koszul
Přispěvatelé: Régulation spatiale des Génomes - Spatial Regulation of Genomes, Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), Imagerie et Modélisation, Institut de Recherche sur le Cancer et le Vieillissement (IRCAN), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology (LCQB), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut de Biologie Paris Seine (IBPS), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), R.K. from the European Research Council under the 7th Framework Program (FP7/2007-2013)/ERC grant agreement (260822) and by Agence Nationale de la Recherche (ANR-09-PIRI-0024) to C.Z. and R.K. H.M-N. is supported by a fellowship from Fondation pour la Recherche Médicale (FRM). MM is the recipient of an Association pour la Recherche sur le Cancer fellowship (20100600373) and C.Z. is also supported by a FRM grant (DEQ20100318291)., ANR-09-PIRI-0024,Chromodyn(2009), European Project: 260822,EC:FP7:ERC,ERC-2010-StG_20091118,DICIG(2011), Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Université Nice Sophia Antipolis (... - 2019) (UNS), Institut de Biologie Paris Seine (IBPS), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), Martin, Marie, Programme interdisciplinaire sur les systèmes biologiques et d'innovation biomédicale - - Chromodyn2009 - ANR-09-PIRI-0024 - PIRI - VALID, Dynamic Interplay between Eukaryotic Chromosomes: Impact on Genome Stability - DICIG - - EC:FP7:ERC2011-06-01 - 2017-05-31 - 260822 - VALID
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Statistics and Probability
MESH: Molecular Sequence Annotation/methods
[SDV]Life Sciences [q-bio]
Centromere
Computational biology
Biology
Origin of replication
DNA
Ribosomal

Synteny
Biochemistry
Genome
Chromosome conformation capture
03 medical and health sciences
Annotation
MESH: Genome
Fungal/genetics

0302 clinical medicine
Consensus Sequence
Consensus sequence
MESH: Consensus Sequence
Molecular Biology
Ribosomal DNA
030304 developmental biology
MESH: DNA
Ribosomal/genetics

Genetics
0303 health sciences
Chromosome
Molecular Sequence Annotation
MESH: Synteny
Genomics
Computer Science Applications
[SDV] Life Sciences [q-bio]
Computational Mathematics
MESH: Genomics/methods
Computational Theory and Mathematics
MESH: Centromere/genetics
Genetic Loci
Saccharomycetales
MESH: Saccharomycetales/genetics
Genome
Fungal

030217 neurology & neurosurgery
MESH: Genetic Loci/genetics
Zdroj: Bioinformatics
Bioinformatics, 2014, 30 (15), pp.2105-13. ⟨10.1093/bioinformatics/btu162⟩
Bioinformatics, Oxford University Press (OUP), 2014, 30 (15), pp.2105-13. ⟨10.1093/bioinformatics/btu162⟩
ISSN: 1367-4803
1367-4811
DOI: 10.1093/bioinformatics/btu162
Popis: Motivations: De novo sequencing of genomes is followed by annotation analyses aiming at identifying functional genomic features such as genes, non-coding RNAs or regulatory sequences, taking advantage of diverse datasets. These steps sometimes fail at detecting non-coding functional sequences: for example, origins of replication, centromeres and rDNA positions have proven difficult to annotate with high confidence. Here, we demonstrate an unconventional application of Chromosome Conformation Capture (3C) technique, which typically aims at deciphering the average 3D organization of genomes, by showing how functional information about the sequence can be extracted solely from the chromosome contact map. Results: Specifically, we describe a combined experimental and bioinformatic procedure that determines the genomic positions of centromeres and ribosomal DNA clusters in yeasts, including species where classical computational approaches fail. For instance, we determined the centromere positions in Naumovozyma castellii , where these coordinates could not be obtained previously. Although computed centromere positions were characterized by conserved synteny with neighboring species, no consensus sequences could be found, suggesting that centromeric binding proteins or mechanisms have significantly diverged. We also used our approach to refine centromere positions in Kuraishia capsulata and to identify rDNA positions in Debaryomyces hansenii . Our study demonstrates how 3C data can be used to complete the functional annotation of eukaryotic genomes. Availability and implementation: The source code is provided in the Supplementary Material. This includes a zipped file with the Python code and a contact matrix of Saccharomyces cerevisiae . Contact: romain.koszul@pasteur.fr Supplementary information: Supplementary data are available at Bioinformatics online
Databáze: OpenAIRE