FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search
Autor: | Singh, Aditi, Subramanya, Suhas Jayaram, Krishnaswamy, Ravishankar, Simhadri, Harsha Vardhan |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Approximate nearest neighbor search (ANNS) is a fundamental building block in information retrieval with graph-based indices being the current state-of-the-art and widely used in the industry. Recent advances in graph-based indices have made it possible to index and search billion-point datasets with high recall and millisecond-level latency on a single commodity machine with an SSD. However, existing graph algorithms for ANNS support only static indices that cannot reflect real-time changes to the corpus required by many key real-world scenarios (e.g. index of sentences in documents, email, or a news index). To overcome this drawback, the current industry practice for manifesting updates into such indices is to periodically re-build these indices, which can be prohibitively expensive. In this paper, we present the first graph-based ANNS index that reflects corpus updates into the index in real-time without compromising on search performance. Using update rules for this index, we design FreshDiskANN, a system that can index over a billion points on a workstation with an SSD and limited memory, and support thousands of concurrent real-time inserts, deletes and searches per second each, while retaining $>95\%$ 5-recall@5. This represents a 5-10x reduction in the cost of maintaining freshness in indices when compared to existing methods. Comment: 19 pages, 22 figures |
Databáze: | arXiv |
Externí odkaz: |