DNA sequence and structure: direct and indirect recognition in protein-DNA binding

Autor:	Richard H. Lathrop, S.D. Murphy, N.R. Steffen, L. Tolleri, G.W. Hatfield
Rok vydání:	2002
Předmět:	Statistics and Probability DNA Bacterial Integration Host Factors Models Molecular Sequence analysis Macromolecular Substances Molecular Sequence Data Computational biology Biology Biochemistry chemistry.chemical_compound Structure-Activity Relationship A-DNA Amino Acid Sequence Binding site Molecular Biology Peptide sequence Genetics Multiple sequence alignment Binding Sites Models Statistical Escherichia coli Proteins Sequence Analysis DNA Computer Science Applications DNA binding site DNA-Binding Proteins Computational Mathematics Computational Theory and Mathematics chemistry Models Chemical Threading (protein sequence) Sequence Alignment DNA Algorithms Protein Binding
Zdroj:	ISMB
ISSN:	1367-4803
Popis:	Motivation: Direct recognition, or direct readout, of DNA bases by a DNA-binding protein involves amino acids that interact directly with features specific to each base. Experimental evidence also shows that in many cases the protein achieves partial sequence specificity by indirect recognition, i.e., by recognizing structural properties of the DNA. (1) Could threading a DNA sequence onto a crystal structure of bound DNA help explain the indirect recognition component of sequence specificity? (2) Might the resulting pure-structure computational motif manifest itself in familiar sequence-based computational motifs? Results: The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding. Availability: Code and data on request. Contact:Nick Steffen for code and Lorenzo Tolleri for data. nsteffen@uci.eduLorenzo_Tolleri@chiron.it Keywords: protein-DNA binding sites; sequence motifs or patterns; indirect recognition or readout; integration host factor; IHF.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e9b4b90903bef8ce5886c209abb4199 https://pubmed.ncbi.nlm.nih.gov/12169527 Zobrazit plný text záznamu