Additional file 1 of Biased visibility in Hi-C datasets marks dynamically regulated condensed and decondensed chromatin states genome-wide

Autor: Keerthivasan Chandradoss, Prashanth Guthikonda, Kethavath, Srinivas, Dass, Monika, Singh, Harpreet, Nayak, Rakhee, Sreenivasulu Kurukuti, Kuljeet Sandhu
Rok vydání: 2020
DOI: 10.6084/m9.figshare.11888511
Popis: Additional file 1: Table S1. Details of the datasets used in the study. The universal resource locations (URLs) of NCBI GEO, ENCODE, UCSC genome browser and ArrayExpress are https://www.ncbi.nlm.nih.gov/geo/, https://www.encodeproject.org/, https://genome.ucsc.edu/ and https://www.ebi.ac.uk/arrayexpress/ respectively. Figure S1. Related to Fig. 1. (a) Loess correction for the negative scaling of raw read counts against the restriction site (RE) density in 10 Kb genomic bins. Left panel represents data before loess correction and right panel after loess correction of read counts against REdensity (b) Loess correction for the positive scaling of RE-corrected read counts against the GC content of 10 Kb genomic bins. First panel shows scatter plot of GC content vs. REcorrected read counts. Second panel shows scatter plot of GC content vs. GC- and REcorrected read counts. Third panel represents RE-density vs. GC- and RE-corrected read counts. Fourth panel shows scaling of GC- and RE-corrected read counts against the raw read counts. (c) Corrected 1D read count as a function of mappability score (≥0.8). The datasets analysed are mentioned on top of each panel. (d) Analysis of RED-seq data by directly mapping the reads to mm10 assembly of mouse genome. The scatter plot of naked DNA vs. in-situ chromatin re-captures the pattern shown in Fig. 1a. The distributions of corrected read-counts of in-situ digested chromatin and in-solution digested naked DNA for cLADs and ciLAD regions echo our observations in Fig. 1b. (e) The distribution of corrected 1D Hi-C read counts in mESC. (f) Size distribution of domains identified through analysis of corrected read counts. Plotted are the mean values with the standard error bars (g) Genomic coverage of condensed domains within constitutive LAD and constitutive inter-LAD regions. Shown are the pie charts of 10Kb bins mapping to cLAD and ciLADs in different datasets. Figure S2. Related to Fig. 1. (a) Distributions of raw and corrected read count in cLAD and ciLADs across different in-situ Hi-C datasets in mouse. (b) Distributions of raw and corrected read count in cLAD and ciLADs across different in-situ Hi-C datasets in human. We calculated p-values using two-tailed Mann-Whitney U test. Figure S3. Related to Fig. 1. (a) Distributions of bowtie-processed raw and corrected read counts in cLAD and ciLADs across in-situ Hi-C datasets of mESC, NPC and CN cells. We calculated p-values using two-tailed Mann-Whitney U test. (b) Side-by-side comparison of raw and corrected read counts mapping to cLAD and ciLADs in in-situ and in-solution (dilution) Hi-C datasets obtained for the same cells (mouse fetal liver) from the same study. (c) Scatter plot of corrected read counts obtained from in-situ and in-solution Hi-C datasets. Figure S4. Related to Fig. 2. (a) Distribution of interaction frequencies of decondensedto-decondensed and condensed-to-condensed interactions as a function of genomic distance in the raw, HiCNorm-corrected and ICE-corrected HiC, and GAM datasets. Upper and lower panels show plots without and with DiSCO corrections respectively. Both axes are log10 tranformed and y-axis was further scaled from 0 to 1 for comparison across plots. (b) Distribution of ICE+DiSCO corrected 1D read counts in the condensed and decondensed domains. (c) Additional examples comparing the corrected 1D read counts and the contact matrices of raw, HiCNorm-corrected, ICE-corrected HiC, and the GAM datasets. (d-e) Additional examples comparing the contact matrices of raw, HiCNorm-corrected, and ICEcorrected Hi-C datasets with and without DiSCO correction. Figure S5. Related to Figure4. (a) Scatter plots of corrected read counts in mESC vs. NPC and in NPC vs. CN. (b) Enrichment of histone modifications around boundary between decondensed and condensed domains in mouse cortical neurons (CN). (c) Enrichment of various genomic attributes around domain boundaries and domain centers. (d) Distribution of condensed and decondensed states of chromatin domains during mESC to NPC differentiation. (e-g) Examples of histone modification profiles around ciLAD condensed and decondensed domains in mESC and NPC. (h) Visibility bias at polycomb regulated HoxA and HoxD loci. These loci are condensed in mESC through polycomb proteins, but are decondensed in NPC. The corrected 1D read counts of Hi-C (Fraser et al. 2016) corroborated this pattern. Figure S6. Related to Fig. 4. (a) Enrichment of Gene Ontology Process terms among the genes exhibiting condensation (left) and decondensation (right) during ESC-to-NPC transition. Shown are the top 30 terms through ToppGene Suite. Nervous system associated terms are highlighted in brown colour. (b) Significance of overlap between MSigDB gene sets and the genes exhibiting condensation (left) and decondensation (right) during ESC-to-NPC transition. Shown are the top-30 terms through Gene Set Enrichment Analysis (GSEA). Polycomb associated terms are highlighted in brown colour. Vertical dashed line in each plot marks FDR of 0.05. Figure S7. (a) Analysis of Native Hi-C data. Scatter plots represents the correlation between 1D reads of in-situ Hi-C and native Hi-C. (b) The boxplots represent the distributions of raw and corrected 1D read counts of in-situ and native Hi-C for the cLAD and ciLAD regions. (c) An example of chromosomal tracks of raw and corrected read counts of in-situ and native Hi-C. (d) Chromosomal tracks of raw and corrected read counts for the region Chr2: 120–240 mb in K562 cell-line. These tracks should be viewed in an approximate alignment to Fig. 4F of Belaghzai et al, bioRxiv 2019. (e) Distributions of raw and corrected read counts of MNase digested chromatin for polytene band and inter-band regions in drosophila Kc167 cell-line. From left to right are the boxplots for different time points of MNase digestion. It is apparent that after 40 min of MNase digestion, both the band and inter-band regions exhibit similar levels of read counts, implying lack of bias.
Databáze: OpenAIRE