MSQ-Index: A Succinct Index for Fast Graph Similarity Search

Autor: Jun Huan, Lei Zou, Hongwei Huo, Weiguo Zheng, Xiaoyang Chen, Jeffrey Scott Vitter
Rok vydání: 2021
Předmět:
Zdroj: IEEE Transactions on Knowledge and Data Engineering. 33:2654-2668
ISSN: 2326-3865
1041-4347
DOI: 10.1109/tkde.2019.2954527
Popis: Graph similarity search under the graph edit distance constraint has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with tens of millions of graphs. In this article, we present a succinct index that incorporates succinct data structures and hybrid encoding to achieve improved query time performance with minimal space usage. Specifically, the space usage of our index requires only 5–15 percent of the previous state-of-the-art indexing size while at the same time achieving several times acceleration in query time on the tested data. We also improve the query performance by augmenting the global filter with range searching, which allows us to perform similarity search in a reduced region. In addition, we propose two effective lower bounds together with a boosting technique to obtain the smallest possible candidate set. Extensive experiments demonstrate that our proposed approach is superior both in space and filtering to the state-of-the-art approaches. To the best of our knowledge, our index is the first in-memory index for this problem that successfully scales to cope with the large dataset of 25 million chemical structure graphs from the PubChem dataset. The source code is available online.
Databáze: OpenAIRE