hogwash: Three Methods for Genome-Wide Association Studies in Bacteria
Autor: | Saund, Katie, Snitkin, Evan S. |
---|---|
Rok vydání: | 2020 |
Předmět: |
bacterial genomics
Method Genome-wide association study Computational biology Biology 03 medical and health sciences 0302 clinical medicine Genotype GWAS Humans Genomic Methodologies: Genome-phenotype association convergent evolution Gene 030304 developmental biology Genetic association 0303 health sciences Bacteria Phylogenetic tree software Computational Biology Genetic Variation General Medicine Pathway analysis R package Open source Simulated data Algorithms Genome Bacterial 030217 neurology & neurosurgery Genome-Wide Association Study |
Zdroj: | Microbial Genomics |
DOI: | 10.1101/2020.04.19.048421 |
Popis: | Bacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence based bGWAS methods identify genomic mutations that arise more often in the presence of phenotypic variation than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence based bGWAS. Hogwash additionally contains a novel grouping tool to perform gene- or pathway-analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub. DATA SUMMARY hogwash is available from GitHub under the MIT license ( https://github.com/katiesaund/hogwash ) and can be installed using the R command devtools::install_github(“katiesaund/hogwash”) The simulated data used in this manuscript and the code to generate it are available from GitHub ( https://github.com/katiesaund/simulate_data_for_convergence_based_bGWAS ) IMPACT STATEMENT We introduce hogwash, an R package with three methods for bacterial genome-wide association studies. There are two methods for handling binary phenotypes, including an implementation of PhyC(1), as well as one method for handling continuous phenotypes. We formulate two novel indices quantifying the relationship between phenotype convergence and genotype convergence on a phylogenetic tree, one for binary phenotypes and one for continuous phenotypes. These indices shape an intuitive understanding for the ability of hogwash to detect significant intersections of phenotype convergence and genotype convergence and how to interpret hogwash outputs. |
Databáze: | OpenAIRE |
Externí odkaz: |