Big data analytics on HPC architectures: Performance and cost
Autor: | Sreenivas R. Sukumar, Jamison Daniel, Peter Xenopoulos, Michael A. Matheson |
---|---|
Rok vydání: | 2016 |
Předmět: |
020203 distributed computing
business.industry Computer science Distributed computing Big data Software development Petabyte 020207 software engineering Cloud computing 02 engineering and technology computer.software_genre Analytics 0202 electrical engineering electronic engineering information engineering Operating system business Software architecture computer |
Zdroj: | IEEE BigData |
DOI: | 10.1109/bigdata.2016.7840861 |
Popis: | Data driven science, accompanied by the explosion of petabytes of data, has called into need dedicated analytics computing resources. Dedicated analytics clusters require large capital outlays due to their expensive hardware requirements. Additionally, if such resources are located far from the data they analyze, they also incur substantial data transfer, which has both cost and latency implications. In this paper, we benchmark a variety of high-performance computing (HPC) architectures for classic data science algorithms, as well as conduct a cost analysis of these architectures. Additionally, we compare algorithms across analytic frameworks, as well as explore hidden costs in the form of queuing mechanisms. We observe that node architectures with large memory and high memory bandwidth are better suited for big data analytics on HPC hardware. We also conclude that cloud computing is more cost effective for small or experimental data workloads, but HPC is more cost effective at scale. Additionally, we quantify the hidden costs of queuing and how it relates to data science workloads. Finally, we observe that software developed for the cloud, such as Spark, performs significantly worse than pbdR when run in HPC environments. |
Databáze: | OpenAIRE |
Externí odkaz: |