Notos - a Galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types

Autor: Ingo Bulla, Christoph Grunau, Benoît Aliaga, Virginia Lacal, Jan Bulla, Cristian Chaparro
Přispěvatelé: Los Alamos National Laboratory (LANL), Interactions Hôtes-Pathogènes-Environnements (IHPE), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Institut Français de Recherche pour l'Exploitation de la Mer (IFREMER)-Université de Perpignan Via Domitia (UPVD), Laboratoire de Mathématiques Nicolas Oresme (LMNO), Centre National de la Recherche Scientifique (CNRS)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU)
Jazyk: angličtina
Rok vydání: 2017
Předmět:
0301 basic medicine
Citrus
CpG o/e ratio
Gaussian
Kernel density estimation
Grasshoppers
Computational biology
Moths
Biology
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
CpN o/e ratio
03 medical and health sciences
symbols.namesake
0302 clinical medicine
Structural Biology
Animals
Cluster Analysis
Humans
Epigenetics
lcsh:QH301-705.5
Molecular Biology
Mathematics
030304 developmental biology
Genetics
Alligators and Crocodiles
0303 health sciences
DNA methylation
Neurospora crassa
[SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE]
Applied Mathematics
Methylation
Empirical distribution function
Computer Science Applications
030104 developmental biology
lcsh:Biology (General)
CpG site
symbols
lcsh:R858-859.7
CpG Islands
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
DNA microarray
Frequency distribution
Software
030217 neurology & neurosurgery
Research Article
Zdroj: Bmc Bioinformatics (1471-2105) (Biomed Central Ltd), 2018, Vol. 19, N. 105, P. 13p.
BMC Bioinformatics
BMC Bioinformatics, BioMed Central, 2018, 19, pp.105. ⟨10.1186/s12859-018-2115-4⟩
BMC Bioinformatics, Vol 19, Iss 1, Pp 1-13 (2018)
ISSN: 1471-2105
DOI: 10.1101/180463
Popis: BackgroundDNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG.ResultsPredominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions withouta prioriassumptions about the underlying distributions.ConclusionsWe show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl.
Databáze: OpenAIRE