MSQ-Index: A Succinct Index for Fast Graph Similarity Search
Autor: | Jun Huan, Lei Zou, Hongwei Huo, Weiguo Zheng, Xiaoyang Chen, Jeffrey Scott Vitter |
---|---|
Rok vydání: | 2021 |
Předmět: |
Graph database
Boosting (machine learning) Computer science Nearest neighbor search Search engine indexing 02 engineering and technology computer.software_genre Data structure Graph Computer Science Applications Succinct data structure Range searching Computational Theory and Mathematics 020204 information systems Scalability 0202 electrical engineering electronic engineering information engineering Data mining computer Information Systems |
Zdroj: | IEEE Transactions on Knowledge and Data Engineering. 33:2654-2668 |
ISSN: | 2326-3865 1041-4347 |
DOI: | 10.1109/tkde.2019.2954527 |
Popis: | Graph similarity search under the graph edit distance constraint has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with tens of millions of graphs. In this article, we present a succinct index that incorporates succinct data structures and hybrid encoding to achieve improved query time performance with minimal space usage. Specifically, the space usage of our index requires only 5–15 percent of the previous state-of-the-art indexing size while at the same time achieving several times acceleration in query time on the tested data. We also improve the query performance by augmenting the global filter with range searching, which allows us to perform similarity search in a reduced region. In addition, we propose two effective lower bounds together with a boosting technique to obtain the smallest possible candidate set. Extensive experiments demonstrate that our proposed approach is superior both in space and filtering to the state-of-the-art approaches. To the best of our knowledge, our index is the first in-memory index for this problem that successfully scales to cope with the large dataset of 25 million chemical structure graphs from the PubChem dataset. The source code is available online. |
Databáze: | OpenAIRE |
Externí odkaz: |