hogwash: Three Methods for Genome-Wide Association Studies in Bacteria

Autor: Saund, Katie, Snitkin, Evan S.
Rok vydání: 2020
Předmět:
Zdroj: Microbial Genomics
DOI: 10.1101/2020.04.19.048421
Popis: Bacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence based bGWAS methods identify genomic mutations that arise more often in the presence of phenotypic variation than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence based bGWAS. Hogwash additionally contains a novel grouping tool to perform gene- or pathway-analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub. DATA SUMMARY hogwash is available from GitHub under the MIT license ( https://github.com/katiesaund/hogwash ) and can be installed using the R command devtools::install_github(“katiesaund/hogwash”) The simulated data used in this manuscript and the code to generate it are available from GitHub ( https://github.com/katiesaund/simulate_data_for_convergence_based_bGWAS ) IMPACT STATEMENT We introduce hogwash, an R package with three methods for bacterial genome-wide association studies. There are two methods for handling binary phenotypes, including an implementation of PhyC(1), as well as one method for handling continuous phenotypes. We formulate two novel indices quantifying the relationship between phenotype convergence and genotype convergence on a phylogenetic tree, one for binary phenotypes and one for continuous phenotypes. These indices shape an intuitive understanding for the ability of hogwash to detect significant intersections of phenotype convergence and genotype convergence and how to interpret hogwash outputs.
Databáze: OpenAIRE