Predictive Models of Genome-wide Aryl Hydrocarbon Receptor DNA Binding Reveal Tissue Specific Binding Determinants

Autor: David Filipovic, Wenjie Qi, Omar Kana, Daniel Marri, Edward L. LeCluyse, Melvin E. Andersen, Suresh Cuddapah, Sudin Bhattacharya
Rok vydání: 2022
Popis: BackgroundThe Aryl Hydrocarbon Receptor (AhR) is an inducible transcription factor (TF) whose ligands include the environmental contaminant 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). TCDD-mediated toxicity occurs through activation of AhR and its subsequent binding to the Dioxin Response Element (DRE), comprising the DNA motif 5’-GCGTG-3’. However, AhR binding in human tissues is highly dynamic and tissue specific. Approximately 50% of all experimentally verified AhR binding sites do not contain a DRE. Additionally, most accessible DREs are not bound by AhR. Identification of tissue specific AhR binding determinants is crucial for understanding downstream gene regulation and potential adverse outcomes of AhR activation.ResultsWe applied XGBoost, a supervised machine learning architecture, to predict the genome wide AhR binding status of DREs in open chromatin as a function of DNA sequence flanking the DRE, chromatin accessibility, histone modifications (HM), TF binding, and proximity of the DRE to gene promoters. We trained and validated our models using 5-fold cross validation to predict the binding status of DREs in AhR-activated MCF-7 breast cancer cells, primary human hepatocytes, and lymphoblastoid GM17212 cells, as well as AhR non-activated HepG2 hepatocellular carcinoma cells. Our results demonstrate highly accurate and robust models of AhR binding; and identify patterns of transcription factor binding and histone modifications predictive of AhR binding. These patterns are consistent within tissues but highly variable across tissues, which is suggestive of tissue-specific mechanisms of AhR binding.ConclusionsAhR binding is driven by a complex interplay of tissue-agnostic DNA sequence flanking its binding motif and tissue-specific local chromatin context.
Databáze: OpenAIRE