DROP

Autor:	Sahaana Suri, Peter Bailis
Rok vydání:	2019
Předmět:	Computer science 020204 information systems Dimensionality reduction Principal component analysis Singular value decomposition 0202 electrical engineering electronic engineering information engineering Overhead (computing) 020207 software engineering 02 engineering and technology Algorithm Curse of dimensionality k-nearest neighbors algorithm
Zdroj:	DEEM@SIGMOD
Popis:	Dimensionality reduction (DR) is critical in scaling machine learning pipelines: by reducing input dimensionality in exchange for a preprocessing overhead, DR enables faster end-to-end runtime. Principal component analysis (PCA) is a DR standard, but can be computationally expensive: classically O(dn2 + n3) for an n-dimensional dataset of d points. Theoretical work has optimized PCA via iterative, sample-based stochastic methods. However, these methods execute for a fixed number of iterations or to convergence, sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after processing small (e.g., 1%) samples of data. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition (SVD)-based PCA, and 16x over conventional DR methods in end-to-end nearest neighbor workloads.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9409f3ec4071a555d13cfb731a5d51ea https://doi.org/10.1145/3329486.3329490 Zobrazit plný text záznamu