A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites
Autor: | Vincent Hakim, Thierry Mora, Marc Santolini |
---|---|
Přispěvatelé: | Laboratoire de Physique Statistique de l'ENS (LPS), Université Paris Diderot - Paris 7 (UPD7)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: |
Statistical noise
Gene regulatory network lcsh:Medicine Biochemistry Biophysics Theory Mice 0302 clinical medicine Cell Signaling Nucleic Acids Molecular Cell Biology lcsh:Science Cells Cultured Genetics Physics [PHYS]Physics [physics] 0303 health sciences Multidisciplinary Principle of maximum entropy Drosophila melanogaster Physical Sciences Sequence Analysis Algorithms Protein Binding Research Article Signal Transduction Base pair Molecular Sequence Data DNA transcription Biophysics Computational biology Response Elements Statistical Mechanics 03 medical and health sciences Animals Position-Specific Scoring Matrices Molecular Biology Techniques Sequencing Techniques Molecular Biology Theoretical Biology 030304 developmental biology Binding Sites Base Sequence Biology and life sciences lcsh:R Computational Biology DNA Cell Biology Models Theoretical DNA binding site Nucleotide Mapping lcsh:Q Pairwise comparison Transcriptional Signaling Gene expression 030217 neurology & neurosurgery Transcription Factors |
Zdroj: | PLoS ONE PLoS ONE, Public Library of Science, 2014, 9 (6), pp.e99015. ⟨10.1371/journal.pone.0099015⟩ PLoS ONE, 2014, 9 (6), pp.e99015. ⟨10.1371/journal.pone.0099015⟩ PLoS ONE, Vol 9, Iss 6, p e99015 (2014) |
ISSN: | 1932-6203 |
DOI: | 10.1371/journal.pone.0099015⟩ |
Popis: | International audience; The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs. |
Databáze: | OpenAIRE |
Externí odkaz: |