pulver: An R package for parallel ultra-rapid p-value computation for linear regression interaction terms

Autor:	Fabian J. Theis, Konstantin Strauch, Gabi Kastenmüller, Melanie Waldenberger, Christian Gieger, Thomas Meitinger, Clemens Baumbach, Karsten Suhre, Harald Grallert, Rui Wang-Sattler, Jerzy Adamski, Sophie Molnos, Martina Müller-Nurasyid, Simone Wahl, Annette Peters
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	0301 basic medicine Time Factors Theoretical computer science Correlation coefficient Computer science Computation lcsh:Computer applications to medicine. Medical informatics Polymorphism Single Nucleotide Biochemistry 03 medical and health sciences Software Structural Biology Linear regression Humans Computer Simulation p-value Molecular Biology lcsh:QH301-705.5 Genetic association Linear regression interaction term business.industry Applied Mathematics SNP–CpG interaction Linear model Computer Science Applications Term (time) Algorithm 030104 developmental biology lcsh:Biology (General) Algorithm Linear regression interaction term SNP–CpG interaction Software Linear Models lcsh:R858-859.7 CpG Islands business Algorithms
Zdroj:	BMC Bioinformatics 18:429 (2017) BMC Bioinformatics, Vol 18, Iss 1, Pp 1-8 (2017) BMC Bioinformatics
Popis:	Background Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different “omics” layers. Existing tools only consider single-nucleotide polymorphism (SNP)–SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different “omics” layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/. Electronic supplementary material The online version of this article (10.1186/s12859-017-1838-y) contains supplementary material, which is available to authorized users.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::391dd143bf1e29b7e60354e5e3482d7a https://push-zb.helmholtz-muenchen.de/frontdoor.php?source_opus=52023 Zobrazit plný text záznamu