RACS: rapid analysis of ChIP-Seq data for contig based genomes

Autor: Jeffrey S. Fillingham, Marcelo Ponce, Syed Nabeel-Shah, Alejandro Saettone
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Computer science
ved/biology.organism_classification_rank.species
Computational biology
Bioinformatics pipeline
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
Genome
DNA sequencing
Tetrahymena thermophila
03 medical and health sciences
0302 clinical medicine
Structural Biology
Next generation sequencing
False positive paradox
Humans
Quantitative Biology - Genomics
Model organism
Molecular Biology
lcsh:QH301-705.5
High-performance computing
030304 developmental biology
Genomics (q-bio.GN)
0303 health sciences
Contig
ved/biology
Applied Mathematics
Methodology Article
Chromosome Mapping
Molecular Sequence Annotation
Genomics
Sequence Analysis
DNA

Pipeline (software)
Chromatin immunoprecipitation
Computer Science Applications
Data set
Pipeline transport
lcsh:Biology (General)
FOS: Biological sciences
lcsh:R858-859.7
Chromatin Immunoprecipitation Sequencing
030217 neurology & neurosurgery
Zdroj: BMC Bioinformatics
BMC Bioinformatics, Vol 20, Iss 1, Pp 1-17 (2019)
ISSN: 1471-2105
Popis: Background: Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely used technique to investigate the function of chromatin-related proteins in a genome-wide manner. ChIP-Seq generates large quantities of data which can be difficult to process and analyse, particularly for organisms with contig based genomes. Contig-based genomes often have poor annotations for cis-elements, for example enhancers, that are important for gene expression. Poorly annotated genomes make a comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. Methods: We report a computational pipeline that utilizes traditional High-Performance Computing techniques and open source tools for processing and analysing data obtained from ChIP-Seq. We applied our computational pipeline "Rapid Analysis of ChIP-Seq data" (RACS) to ChIP-Seq data that was generated in the model organism Tetrahymena thermophila, an example of an organism with a genome that is available in contigs. Results: To test the performance and efficiency of RACs, we performed control ChIP-Seq experiments allowing us to rapidly eliminate false positives when analyzing our previously published data set. Our pipeline segregates the found read accumulations between genic and intergenic regions and is highly efficient for rapid downstream analyses. Conclusions: Altogether, the computational pipeline presented in this report is an efficient and highly reliable tool to analyze genome-wide ChIP-Seq data generated in model organisms with contig-based genomes. RACS is an open source computational pipeline available to download from: https://bitbucket.org/mjponce/racs --or-- https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS
Submitted to BMC Bioinformatics. Computational pipeline available at https://bitbucket.org/mjponce/racs
Databáze: OpenAIRE