A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites

Autor: Vincent Hakim, Thierry Mora, Marc Santolini
Přispěvatelé: Laboratoire de Physique Statistique de l'ENS (LPS), Université Paris Diderot - Paris 7 (UPD7)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Statistical noise
Gene regulatory network
lcsh:Medicine
Biochemistry
Biophysics Theory
Mice
0302 clinical medicine
Cell Signaling
Nucleic Acids
Molecular Cell Biology
lcsh:Science
Cells
Cultured

Genetics
Physics
[PHYS]Physics [physics]
0303 health sciences
Multidisciplinary
Principle of maximum entropy
Drosophila melanogaster
Physical Sciences
Sequence Analysis
Algorithms
Protein Binding
Research Article
Signal Transduction
Base pair
Molecular Sequence Data
DNA transcription
Biophysics
Computational biology
Response Elements
Statistical Mechanics
03 medical and health sciences
Animals
Position-Specific Scoring Matrices
Molecular Biology Techniques
Sequencing Techniques
Molecular Biology
Theoretical Biology
030304 developmental biology
Binding Sites
Base Sequence
Biology and life sciences
lcsh:R
Computational Biology
DNA
Cell Biology
Models
Theoretical

DNA binding site
Nucleotide Mapping
lcsh:Q
Pairwise comparison
Transcriptional Signaling
Gene expression
030217 neurology & neurosurgery
Transcription Factors
Zdroj: PLoS ONE
PLoS ONE, Public Library of Science, 2014, 9 (6), pp.e99015. ⟨10.1371/journal.pone.0099015⟩
PLoS ONE, 2014, 9 (6), pp.e99015. ⟨10.1371/journal.pone.0099015⟩
PLoS ONE, Vol 9, Iss 6, p e99015 (2014)
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0099015⟩
Popis: International audience; The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs.
Databáze: OpenAIRE