A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations

Autor:	Rasmus Nielsen, Adi Stern, Guy Ling, Danielle Miller
Přispěvatelé:	Wilke, Claus
Rok vydání:	2020
Předmět:	Mutation rate MCMC Adenosine Deaminase Population Bayesian probability Context (language use) Computational biology APOBEC-3G Deaminase Biology Bayesian inference sequence context 03 medical and health sciences symbols.namesake Viral Proteins 0302 clinical medicine Genetic Models evolutionary model 2.5 Research design and methodologies (aetiology) Methods Genetics Point Mutation Aetiology education Molecular Biology Ecology Evolution Behavior and Systematics 030304 developmental biology Sequence (medicine) 0303 health sciences education.field_of_study Evolutionary Biology Models Genetic Base Sequence mutation rates Computational Biology High-Throughput Nucleotide Sequencing population genetics Statistical model Markov chain Monte Carlo Bayes Theorem Poliovirus symbols HIV-1 HIV/AIDS Generic health relevance Biochemistry and Cell Biology 030217 neurology & neurosurgery
Zdroj:	Molecular biology and evolution, vol 37, iss 3 Molecular Biology and Evolution
Popis:	The probability of point mutations is expected to be highly influenced by the flanking nucleotides that surround them, known as the sequence context. This phenomenon may be mainly attributed to the enzyme that modifies or mutates the genetic material, because most enzymes tend to have specific sequence contexts that dictate their activity. Here, we develop a statistical model that allows for the detection and evaluation of the effects of different sequence contexts on mutation rates from deep population sequencing data. This task is computationally challenging, as the complexity of the model increases exponentially as the context size increases. We established our novel Bayesian method based on sparse model selection methods, with the leading assumption that the number of actual sequence contexts that directly influence mutation rates is minuscule compared with the number of possible sequence contexts. We show that our method is highly accurate on simulated data using pentanucleotide contexts, even when accounting for noisy data. We next analyze empirical population sequencing data from polioviruses and HIV-1 and detect a significant enrichment in sequence contexts associated with deamination by the cellular deaminases ADAR 1/2 and APOBEC3G, respectively. In the current era, where next-generation sequencing data are highly abundant, our approach can be used on any population sequencing data to reveal context-dependent base alterations and may assist in the discovery of novel mutable sites or editing sites.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::85379a94d7ac569ba7e90d04fe0c1415 https://escholarship.org/uc/item/9m96m177 Zobrazit plný text záznamu