Scaling computational genomics to millions of individuals with GPUs
Autor: | Shankara Anand, François Aguet, Sager J. Gosai, Kristin Ardlie, Gad Getz, Eliezer M. Van Allen, Jaegil Kim, Amaro Taylor-Weiner, Nicholas J. Haradhvala |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
lcsh:QH426-470
Computer science Quantitative Trait Loci Short Report Genomics Parallel computing Biology Machine Learning Computer graphics 03 medical and health sciences 0302 clinical medicine Software Computer Graphics Graphics lcsh:QH301-705.5 Scaling 030304 developmental biology Pace 0303 health sciences Extramural business.industry Computational genomics lcsh:Genetics lcsh:Biology (General) 030220 oncology & carcinogenesis Benchmark (computing) business 030217 neurology & neurosurgery |
Zdroj: | Genome Biology Genome Biology, Vol 20, Iss 1, Pp 1-5 (2019) |
DOI: | 10.1101/470138 |
Popis: | Current genomics methods and pipelines were designed to handle tens to thousands of samples, but will soon need to scale to millions to keep up with the pace of data and hypothesis generation in biomedical science. The computational costs associated with processing these growing datasets will become prohibitive without improving the computational efficiency and scalability of methods. Here, we show that implementation of genomics methods using recently developed machine-learning libraries for GPUs will significantly accelerate computations and enable scaling to hundreds of thousands of samples. To demonstrate this and benchmark the use of machine-learning libraries for large-scale genomic analyses, we re-implemented methods for two commonly performed computational genomics tasks: (i) QTL mapping (tensorQTL) and Bayesian non-negative matrix factorization (SignatureAnalyzer-GPU). Our implementations ran > 200 times faster than current CPU-based implementations, e.g., trans-QTL mapping (i.e., 500 billion regressions) took less than 10 minutes, and these analyses are ~5-10 fold cheaper on GPUs due to the vastly shorter runtimes. We anticipate that the accessibility of these libraries (e.g., TensorFlow, PyTorch), and the improvements in run-time will lead to a transition to GPU-based implementations for a wide range of computational genomics methods. |
Databáze: | OpenAIRE |
Externí odkaz: |