Online Nearest Neighbor Search in Binary Space
Autor: | Hassan Ashtiani, Sepehr Eghbali, Ladan Tahvildari |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer science
business.industry Computer Science::Information Retrieval Feature vector Nearest neighbor search Pattern recognition Hamming distance 02 engineering and technology Data structure Substring k-nearest neighbors algorithm ComputingMethodologies_PATTERNRECOGNITION 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Binary code Artificial intelligence business Hamming weight Hamming code |
Zdroj: | ICDM |
Popis: | We revisit the K Nearest Neighbors (KNN) problem in large binary datasets which is of major importance in several applied areas. The goal is to find the K nearest items in a dataset to a query point where both the query and the items lie in the Hamming cube. The problem is addressed in its online setting, that is, data items are inserted sequentially into the dataset. To accommodate efficient similarity search and fast insertion of new items, we propose a data structure that partitions the feature space based on the Hamming weights of the binary codes and their substrings. Empirical evaluations on a large-scale dataset show significant speedup over the linear scan baseline. |
Databáze: | OpenAIRE |
Externí odkaz: |