Graph threshold algorithm

Autor: Wookey Lee, James Jung-Hoon Lee, Charles Cheolgi Lee, Justin J. Song, Tae-Chang Jo
Rok vydání: 2021
Předmět:
Zdroj: The Journal of Supercomputing. 77:9827-9847
ISSN: 1573-0484
0920-8542
DOI: 10.1007/s11227-021-03665-z
Popis: Recently more and more information sources are connected together and become a sort of complex graphs that can be exploited not only as a structured and semi-structured data such as rdb or xml, RDF or NoSQL, but also as many kinds of unstructured data such as web, bioinformatics, genometrics, patents, social media, knowlege graphs, IoT, hidden graph from deep learning results. State of the art studies have suggested methods of presenting data as hyper-graphs in search queries and finding search results in subgraphs or graph embeddings rather than a list of individual node results. We study the problem of retrieving top-k graph results with the query relevances; that is, given a set of query keywords Q on a graph G, we aim to find a subgraph g of G such that g is highly related to Q and closely linked under g. In order to consider the relevant graph results and the connectivity simultaneously, we present an effective algorithm graph threshold algorithm (GTA) based on a threshold algorithm (TA) which works efficiently in non-graph structure. We show that GTA does not need unnecessary searches under given objective functions, and prove the existence of an upper bound of the size of subgraph for top-k results, called $${\textit{hop}}_{{\textit{max}}}$$ , which makes it efficient to find the combined results. Finally, we conduct the performance studies on real and synthetic graphs, which demonstrate that our algorithm significantly outperforms conventional approaches with respect to time complexity and cost consumption.
Databáze: OpenAIRE
Popis
Abstrakt:Recently more and more information sources are connected together and become a sort of complex graphs that can be exploited not only as a structured and semi-structured data such as rdb or xml, RDF or NoSQL, but also as many kinds of unstructured data such as web, bioinformatics, genometrics, patents, social media, knowlege graphs, IoT, hidden graph from deep learning results. State of the art studies have suggested methods of presenting data as hyper-graphs in search queries and finding search results in subgraphs or graph embeddings rather than a list of individual node results. We study the problem of retrieving top-k graph results with the query relevances; that is, given a set of query keywords Q on a graph G, we aim to find a subgraph g of G such that g is highly related to Q and closely linked under g. In order to consider the relevant graph results and the connectivity simultaneously, we present an effective algorithm graph threshold algorithm (GTA) based on a threshold algorithm (TA) which works efficiently in non-graph structure. We show that GTA does not need unnecessary searches under given objective functions, and prove the existence of an upper bound of the size of subgraph for top-k results, called $${\textit{hop}}_{{\textit{max}}}$$ , which makes it efficient to find the combined results. Finally, we conduct the performance studies on real and synthetic graphs, which demonstrate that our algorithm significantly outperforms conventional approaches with respect to time complexity and cost consumption.
ISSN:15730484
09208542
DOI:10.1007/s11227-021-03665-z