Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS
Autor: | M. Arabnejad, B. A. Dawkins, W. S. Bush, B. C. White, A. R. Harkness, B. A. McKinney |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: | |
Zdroj: | BioData Mining, Vol 11, Iss 1, Pp 1-17 (2018) |
Druh dokumentu: | article |
ISSN: | 1756-0381 90511972 |
DOI: | 10.1186/s13040-018-0186-4 |
Popis: | Abstract Background ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding. Methods We introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity. Results We apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias. Conclusions Our results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |