Scalable Nonparametric Prescreening Method for Searching Higher-Order Genetic Interactions Underlying Quantitative Traits
Autor: | Mikko J. Sillanpää, Juho A. J. Kontio |
---|---|
Rok vydání: | 2019 |
Předmět: |
Investigations
Quantitative trait locus Biology computer.software_genre 01 natural sciences 010104 statistics & probability 03 medical and health sciences symbols.namesake Quantitative Trait Heritable Kriging Genetics Preprocessor Computer Simulation 0101 mathematics Gaussian process 030304 developmental biology 0303 health sciences Models Genetic Dimensionality reduction Nonparametric statistics Epistasis Genetic ComputingMethodologies_PATTERNRECOGNITION ROC Curve Scalability symbols Data mining Scenario testing computer |
Zdroj: | Genetics |
ISSN: | 1943-2631 |
DOI: | 10.1534/genetics.119.302658 |
Popis: | The Gaussian process (GP) regression is theoretically capable of capturing higher-order gene-by-gene interactions important to trait variation non-exhaustively with high accuracy. Unfortunately, GP approach is scalable only for 100-200 genes and thus, not applicable for high... Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia. |
Databáze: | OpenAIRE |
Externí odkaz: |