Analysis of SparseHash: An efficient embedding of set-similarity via sparse projections

Autor: Sophie M. Fosson, Tiziano Bianchi, Chiara Ravazzi, Enrico Magli, Diego Valsesia
Rok vydání: 2019
Předmět:
FOS: Computer and information sciences
Jaccard index
Similarity (geometry)
Set-similarity
Computer science
Computer Vision and Pattern Recognition (cs.CV)
Hash function
Computer Science - Computer Vision and Pattern Recognition
Jaccard similarity
Inference
02 engineering and technology
01 natural sciences
Set (abstract data type)
Artificial Intelligence
Computer Science - Data Structures and Algorithms
0103 physical sciences
Euclidean geometry
0202 electrical engineering
electronic engineering
information engineering

Data Structures and Algorithms (cs.DS)
010306 general physics
Sparse matrix
Locality-sensitive hashing
business.industry
Pattern recognition
Embeddings
Sparse random projections
Signal Processing
Embedding
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Artificial intelligence
business
Software
Zdroj: Pattern recognition letters 128 (2019): 93–99. doi:10.1016/j.patrec.2019.08.014
info:cnr-pdr/source/autori:Valsesia, Diego; Fosson, Sophie M.; Ravazzi, Chiara; Bianchi, Tiziano; Magli, Enrico/titolo:Analysis of SparseHash: An efficient embedding of set-similarity via sparse projections/doi:10.1016%2Fj.patrec.2019.08.014/rivista:Pattern recognition letters/anno:2019/pagina_da:93/pagina_a:99/intervallo_pagine:93–99/volume:128
ISSN: 0167-8655
DOI: 10.1016/j.patrec.2019.08.014
Popis: Embeddings provide compact representations of signals in order to perform efficient inference in a wide variety of tasks. In particular, random projections are common tools to construct Euclidean distance-preserving embeddings, while hashing techniques are extensively used to embed set-similarity metrics, such as the Jaccard coefficient. In this letter, we theoretically prove that a class of random projections based on sparse matrices, called SparseHash, can preserve the Jaccard coefficient between the supports of sparse signals, which can be used to estimate set similarities. Moreover, besides the analysis, we provide an efficient implementation and we test the performance in several numerical experiments, both on synthetic and real datasets.
Comment: 25 pages, 6 figures
Databáze: OpenAIRE