Popis: |
Approximate nearest neighbor search (epsilon-ANN) in high dimensions has been mainly addressed by Locality Sensitive Hashing (LSH), which has complexity with polynomial dependence in dimension, sublinear query time, but subquadratic space requirement. We introduce a new “low-quality” embedding for metric spaces requiring that, for some query, there exists an approximate nearest neighbor among the pre-images of its k > 1 approximate nearest neighbors in the target space. In Euclidean spaces, we employ random projections to a dimension inversely proportional to k. Our approach extends to the decision problem with witness of checking whether there exists an approximate near neighbor; this also implies a solution for epsilon-ANN. After dimension reduction, we store points in a uniform grid of side length epsilon/root d’, where d’ is the reduced dimension. Given a query, we explore cells intersecting the unit ball around the query. This data structure requires linear space and query time in O(dn(rho)), rho approximate to 1 - epsilon(2)/ log(1/epsilon); where n denotes input cardinality and d space dimension. Bounds are improved for doubling subsets via r-nets. We present our implementation for epsilon-ANN in C++ and experiments for d |