Performance Evaluation of Python Based Data Analytics Frameworks in Summit: Early Experiences
Autor: | Suhas Somnath, Peter Entschev, Zahra Ronaghi, Benjamín Hernández, John Kirkham, Hao Lu, Joe Eaton, Junqi Yin |
---|---|
Rok vydání: | 2020 |
Předmět: |
010302 applied physics
0303 health sciences geography Summit geography.geographical_feature_category Exploit Java Computer science Best practice Python (programming language) Supercomputer 01 natural sciences Data science 03 medical and health sciences 0103 physical sciences Scalability Data analysis computer 030304 developmental biology computer.programming_language |
Zdroj: | Communications in Computer and Information Science ISBN: 9783030633929 SMC |
Popis: | The explosion in the volumes of data generated from ever-larger simulation campaigns and experiments or observations necessitates competent tools for data wrangling and analysis). While the Oak Ridge Leadership Computing Facility (OLCF) provides a variety of tools to perform data wrangling and data analysis tasks, Python based tools often lack scalability, or the ability to fully exploit the computational capability of OLCF’s Summit supercomputer. NVIDIA RAPIDS and Dask offer a promising solution to accelerate and distribute data analytics workloads from personal computers to heterogeneous supercomputing systems. We discuss early performance evaluation results of RAPIDS and Dask on Summit to understand their capabilities, scalability, and limitations. Our evaluation includes a subset of RAPIDS libraries, i.e., cuDF, cuML, and cuGraph, and Chainer’s CuPy, and their multi-GPU variants when available.We also draw on the observed trends from the performance evaluation results to discuss best practices for maximizing performance. |
Databáze: | OpenAIRE |
Externí odkaz: |