A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites
Autor: | Brian Naughton, Serafim Batzoglou, Douglas L. Brutlag, Eugene Fratkin |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2006 |
Předmět: |
Gene Expression
Saccharomyces cerevisiae Biology 03 medical and health sciences 0302 clinical medicine Genetics Humans Regulatory Elements Transcriptional Binding site Transcription factor 030304 developmental biology 0303 health sciences Binding Sites Models Statistical Models Genetic Nucleotides Eukaryotic transcription Computational Biology Sequence Analysis DNA Position weight matrix Eukaryotic Linear Motif resource DNA binding site Graph (abstract data type) Sequence motif Algorithm 030217 neurology & neurosurgery Algorithms Transcription Factors |
Zdroj: | Nucleic Acids Research |
ISSN: | 1362-4962 0305-1048 |
Popis: | Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs. |
Databáze: | OpenAIRE |
Externí odkaz: |