Predicting DNA-binding specificities of eukaryotic transcription factors
Autor: | Johannes Eichner, Jonas Eichner, Dierk Wanke, Carsten Henneges, Andreas Zell, Jochen Supper, Adrian Schröder |
---|---|
Rok vydání: | 2010 |
Předmět: |
Protein domain
Amino Acid Motifs Molecular Sequence Data lcsh:Medicine Computational Biology/Transcriptional Regulation Sequence alignment Biology DNA-binding protein Binding Competitive 03 medical and health sciences chemistry.chemical_compound Mice Animals Humans Amino Acid Sequence Binding site lcsh:Science Peptide sequence Transcription factor 030304 developmental biology Genetics 0303 health sciences Multidisciplinary Binding Sites 030302 biochemistry & molecular biology Eukaryotic transcription lcsh:R Computational Biology Reproducibility of Results DNA Genetics and Genomics/Bioinformatics Rats DNA-Binding Proteins chemistry Computational Biology/Sequence Motif Analysis lcsh:Q Algorithms Protein Binding Transcription Factors Research Article Computational Biology/Genomics |
Zdroj: | PLoS ONE PLoS ONE, Vol 5, Iss 11, p e13876 (2010) |
ISSN: | 1932-6203 |
Popis: | Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |