Binary Multidimensional Scaling for Hashing

Autor:	Zhouchen Lin, Yameng Huang
Rok vydání:	2018
Předmět:	Theoretical computer science Computer science Nearest neighbor search Hash function 02 engineering and technology 010501 environmental sciences Linear hashing Rolling hash 01 natural sciences K-independent hashing Locality-sensitive hashing Open addressing 0202 electrical engineering electronic engineering information engineering Consistent hashing Computer Science::Databases 0105 earth and related environmental sciences Universal hashing Dynamic perfect hashing 2-choice hashing Computer Graphics and Computer-Aided Design Hash table Hopscotch hashing Cuckoo hashing SUHA Locality preserving hashing 020201 artificial intelligence & image processing Feature hashing Extendible hashing Perfect hash function Software Double hashing
Zdroj:	IEEE Transactions on Image Processing. 27:406-418
ISSN:	1941-0042 1057-7149
DOI:	10.1109/tip.2017.2759250
Popis:	Hashing is a useful technique for fast nearest neighbor search due to its low storage cost and fast query speed. Unsupervised hashing aims at learning binary hash codes for the original features so that the pairwise distances can be best preserved. While several works have targeted on this task, the results are not satisfactory mainly due to the over-simplified model. In this paper, we propose a unified and concise unsupervised hashing framework, called binary multidimensional scaling , which is able to learn the hash code for distance preservation in both batch and online mode. In the batch mode, unlike most existing hashing methods, we do not need to simplify the model by predefining the form of hash map. Instead, we learn the binary codes directly based on the pairwise distances among the normalized original features by alternating minimization. This enables a stronger expressive power of the hash map. In the online mode, we consider the holistic distance relationship between current query example and those we have already learned, rather than only focusing on current data chunk. It is useful when the data come in a streaming fashion. Empirical results show that while being efficient for training, our algorithm outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::753f3d637d15a9dfbeabedf30e31944f https://doi.org/10.1109/tip.2017.2759250 Zobrazit plný text záznamu