Scaling up data-parallel analytics platforms: Linear algebraic operation cases

Autor: Ramakrishnan Kannan, Luna Xu, Seung-Hwan Lim, Min Li, Ali R. Butt
Rok vydání: 2017
Předmět:
Zdroj: IEEE BigData
DOI: 10.1109/bigdata.2017.8257935
Popis: Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are key to supporting large scale data analysis that require efficient processing over millions of data samples. To this end, we present, ARION, a hardware acceleration based approach for scaling-up individual tasks of Spark, a popular data-parallel analytics platform. We support both linear algebraic operations of between two dense matrices, and between sparse and dense matrices in distributed environments. ARION provides a flexible control of acceleration according to matrix density, along with efficient scheduling based on runtime resource utilization. We demonstrate the benefit of our approach for general matrix multiplication operations over large matrices with up to four billion elements by using Gramian matrix computation that is commonly used in machine learning. Experiments show that our approach achieves more than 2× and 1.5× end-to-end performance speedups for dense and sparse matrices, respectively, and up to 57.04× faster computation compared to MLlib, a state of the art Spark-based implementation. This work is sponsored in part by the NSF under the grants: CNS-1565314, CNS-1405697, and CNS-1615411. The manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Databáze: OpenAIRE