Second-generation PLINK: rising to the challenge of larger and richer datasets

Autor:	Chang, Christopher C., Chow, Carson C., Tellier, Laurent C. A. M., Vattikuti, Shashaank, Purcell, Shaun M., Lee, James J.
Rok vydání:	2014
Předmět:	Quantitative Biology - Genomics Statistics - Computation G.3 G.4 J.3
Zdroj:	GigaScience 2015, 4:7
Druh dokumentu:	Working Paper
DOI:	10.1186/s13742-015-0047-8
Popis:	PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use. Comment: 2 figures, 1 additional file
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1410.4803 Zobrazit plný text záznamu View this record from Arxiv