Insights gained from a comprehensive all- against-all transcription factor binding motif benchmarking study

Autor: Ilya E. Vorontsov, Romain Groux, Ivo Grosse, Ivan V. Kulakovskiy, Oriol Fornes, Daria D. Nikolaeva, Philipp Bucher, Benoit Ballester, Jan Grau, Giovanna Ambrosini, Dmitry Penzar, Vsevolod J. Makeev
Přispěvatelé: Ecole Polytechnique Fédérale de Lausanne (EPFL), Russian Academy of Sciences [Moscow] (RAS), University of British Columbia (UBC), Lomonosov Moscow State University (MSU), Theories and Approaches of Genomic Complexity (TAGC), Aix Marseille Université (AMU)-Institut National de la Santé et de la Recherche Médicale (INSERM), Martin-Luther-University Halle-Wittenberg, Swiss government via the Swiss Institute of Bioinformatics, and the European Union via the COST Action CA5205 - GREEKC (coordinator Martin Kuiper)
Jazyk: angličtina
Rok vydání: 2020
Předmět:
lcsh:QH426-470
[SDV]Life Sciences [q-bio]
Cooperativity
chemical and pharmacologic phenomena
Computational biology
Biology
Mice
03 medical and health sciences
0302 clinical medicine
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry
Molecular Biology/Genomics [q-bio.GN]

PBM
Animals
Humans
Protein Interaction Domains and Motifs
PWM
Cluster analysis
lcsh:QH301-705.5
Transcription factor
database
Binding selectivity
030304 developmental biology
Transcription factor binding sites
0303 health sciences
Research
specificities
Benchmarking
DNA-binding domain
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
DNA binding site
ChIP-seq
lcsh:Genetics
lcsh:Biology (General)
cis-regulatory elements
tools
Chromatin Immunoprecipitation Sequencing
HT-SELEX
protein-dna binding
Software
discovery
030217 neurology & neurosurgery
Transcription Factors
De facto standard
Zdroj: Genome Biology
Genome Biology, BioMed Central, 2020, 21 (1), ⟨10.1186/s13059-020-01996-3⟩
Genome Biology, Vol 21, Iss 1, Pp 1-18 (2020)
ISSN: 1465-6906
1474-760X
Popis: Background Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets. Results Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity. Conclusions In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.
Databáze: OpenAIRE