Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping
Autor: | Ye Zheng, Rene Welch, Xin Zeng, Constanza Rojo, Sunduz Keles, Colin N. Dewey, Bo Li |
---|---|
Rok vydání: | 2014 |
Předmět: |
Chromatin Immunoprecipitation
Molecular Sequence Data Sequence alignment Genomics Computational biology Biology ENCODE Genome DNase-Seq DNA sequencing Cellular and Molecular Neuroscience Segmental Duplications Genomic Protein Interaction Mapping Genetics Humans lcsh:QH301-705.5 Molecular Biology Ecology Evolution Behavior and Systematics Segmental duplication Repetitive Sequences Nucleic Acid Ecology Base Sequence Chromosome Mapping High-Throughput Nucleotide Sequencing DNA DNA-Binding Proteins lcsh:Biology (General) Computational Theory and Mathematics Modeling and Simulation K562 Cells Chromatin immunoprecipitation Algorithms Research Article |
Zdroj: | PLoS Computational Biology PLoS Computational Biology, Vol 11, Iss 10, p e1004491 (2015) |
ISSN: | 1553-7358 |
Popis: | Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells’ regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50–100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions. Author Summary Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is widely used for studying in vivo protein-DNA interactions genome-wide. The applicability of this method for profiling repetitive regions of the genome is limited due to short read sizes dominating ChIP-seq applications. We present Perm-seq, which implements a novel generative model for mapping short reads to repetitive regions of genomes. Perm-seq introduces a new class of read alignment algorithms that can combine data from multiple sources. We show with both computational experiments and the analysis of large volumes of ENCODE ChIP-seq data that utilizing DNase-seq derived priors in Perm-seq is especially powerful in mapping protein-DNA interactions in segmental duplication regions. This general approach enables the use of any number of histone ChIP-seq data alone or together with DNase data to supervise read allocation. Our large scale analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions. |
Databáze: | OpenAIRE |
Externí odkaz: |