Residual Based Sampling for Online Low Rank Approximation

Autor:	Aditya Bhaskara, Silvio Lattanzi, Morteza Zadimoghaddam, Sergei Vassilvitskii
Rok vydání:	2020
Předmět:	Adaptive sampling Computer science Low-rank approximation 0102 computer and information sciences 010501 environmental sciences Residual 01 natural sciences Small set 010201 computation theory & mathematics Norm (mathematics) Principal component analysis Online algorithm Algorithm Subspace topology 0105 earth and related environmental sciences
Zdroj:	ITA FOCS
DOI:	10.1109/ita50056.2020.9244974
Popis:	We propose online algorithms for Column Subset Selection (CSS) and Principal Component Analysis (PCA), two methods that are widely employed for data analysis, summarization, and visualization. Given a data matrix A that is revealed one column at a time, the online CSS problems asks to keep a small set of columns, S, that best approximates the space spanned by the columns of A. As each column arrives, the algorithm must irrevocably decide whether to add it to S, or to ignore it. In the online PCA problem, the goal is to output a projection of each column to a low dimensional subspace. In other words, the algorithm must provide an embedding for each column as it arrives, which cannot be changed as new columns arrive.While both of these problems have been studied in the online setting, only additive approximations were known prior to our work. The core of our approach is an adaptive sampling technique that gives a practical and efficient algorithm for both of these problems. We prove that by sampling columns using their "residual norm" (i.e. their norm orthogonal to directions sampled so far), we end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation.We further show how to combine our algorithm "in series" with prior algorithms. In particular, using the results of Boutsidis et al. [5] and Frieze et al. [15] that have additive guarantees, we show how to improve the bounds on the error of our algorithm.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d4e47596f5ee54bc9a9640cb49860f1b https://doi.org/10.1109/ita50056.2020.9244974 Zobrazit plný text záznamu