The Simulated Random Assignment of Missense Mutations Throughout a Gene of Interest Can Determine Whether Missense Mutations Found in That Gene in a Population of Tumor Genomes Are Non-Randomly Distributed v1

Autor: Richard L Cullum, David J Riese II
Rok vydání: 2021
DOI: 10.17504/protocols.io.bwtwpepe
Popis: Human malignancies result from the accumulation of genetic and epigenetic changes to normal cells. In many malignancies, gain-of-function mutations in oncogenes and loss-of-function mutations in tumor suppressor genes drive tumorigenesis and tumor progression. The identification of tumor driver mutations and the genes that host such mutations is critical for the molecular staging and targeted therapy of malignancies. Since tumor driver mutations cause tumorigenesis or tumor progression, the proliferation of tumor cells selects for these mutations. Thus, in a gene that hosts tumor driver mutations, there will be a non-random distribution of mutations across the gene, as mutations that provide a selective advantage for the tumor cells will predominate over mutations that do not provide a selective advantage for the tumor cells. Consider a particular gene in a population of tumor genomes; the total number of coincident missense mutations in that gene, defined here as two or more missense mutations that affect a particular codon, will be greater than the total number of coincident missense mutations that arise through random assignment of missense mutations across the gene. Consequently, here we use the R Statistical Computing environment to simulate the random assignment of missense mutations across a user-specified gene. The number of randomly assigned missense mutations is defined by the user and should be equal to the total number of missense mutations observed in the desired gene in the collection of tumor genomes of interest. Based on the simulated random assignment of missense mutations, the R code then determines the total number of simulated coincident and non-coincident mutations. This simulation is repeated a user-defined number of times, and the average number of simulated coincident and non-coincident mutations is calculated from the set of simulations. The R code then uses a Chi-square test to determine whether the observed number of coincident mutations (in the gene of interest in a collection of tumor genomes) significantly exceeds the average number of simulated coincident mutations. A positive result indicates that the gene hosts a non-random distribution of missense mutations and suggests that the gene hosts tumor driver mutations. We have used this R code to analyze mutations in the ERBB4 receptor tyrosine kinase gene that are found in The Cancer Genome Atlas (TCGA) dataset. Our analysis indicates that the number of coincident mutations observed in ERBB4 in the TCGA dataset is statistically greater than the number of coincident mutations that arise from the simulated random assignment of missense mutations across the ERBB4 gene. This finding indicates that the distribution of missense mutations in ERBB4 in the TCGA dataset is non-random.
Databáze: OpenAIRE