xSVM: Scalable Distributed Kernel Support Vector Machine Training

Autor: Shaoshuai Zhang, Panruo Wu, Ruchi Shah, Ying Lin
Rok vydání: 2019
Předmět:
Zdroj: IEEE BigData
Popis: Kernel Support Vector Machine (SVM) is a popular machine learning model for classification and regression. A significant challenge of large scale Kernel SVM is the size of the Gram matrix $(n \times n)$, which cannot be stored or processed efficiently when training data-set is large (e.g. n in the millions). This paper proposes a novel SVM training algorithm and its parallelization strategy that can efficiently train on data-sets with millions of samples on thousands of processors. It consists of an accurate, fast, and scalable low rank matrix approximation based on random projection, and a primal-dual interior point method to solve the approximated optimization problem. We demonstrate that xSVM is fast, scalable, and accurate on large scale data-sets and computing nodes. Compared to state-of-the-art distributed Kernel L1-SVM system xSVM is consistently several times faster, with comparable accuracy to the exact model trained by LIBSVM.
Databáze: OpenAIRE