DROP
Autor: | Sahaana Suri, Peter Bailis |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
020204 information systems Dimensionality reduction Principal component analysis Singular value decomposition 0202 electrical engineering electronic engineering information engineering Overhead (computing) 020207 software engineering 02 engineering and technology Algorithm Curse of dimensionality k-nearest neighbors algorithm |
Zdroj: | DEEM@SIGMOD |
Popis: | Dimensionality reduction (DR) is critical in scaling machine learning pipelines: by reducing input dimensionality in exchange for a preprocessing overhead, DR enables faster end-to-end runtime. Principal component analysis (PCA) is a DR standard, but can be computationally expensive: classically O(dn2 + n3) for an n-dimensional dataset of d points. Theoretical work has optimized PCA via iterative, sample-based stochastic methods. However, these methods execute for a fixed number of iterations or to convergence, sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after processing small (e.g., 1%) samples of data. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition (SVD)-based PCA, and 16x over conventional DR methods in end-to-end nearest neighbor workloads. |
Databáze: | OpenAIRE |
Externí odkaz: |