Graph threshold algorithm
Autor: | Wookey Lee, James Jung-Hoon Lee, Charles Cheolgi Lee, Justin J. Song, Tae-Chang Jo |
---|---|
Rok vydání: | 2021 |
Předmět: |
020203 distributed computing
Computer science Order (ring theory) Unstructured data 02 engineering and technology computer.file_format NoSQL computer.software_genre Upper and lower bounds Theoretical Computer Science Set (abstract data type) Hardware and Architecture 0202 electrical engineering electronic engineering information engineering sort RDF computer Algorithm Time complexity Software Information Systems |
Zdroj: | The Journal of Supercomputing. 77:9827-9847 |
ISSN: | 1573-0484 0920-8542 |
DOI: | 10.1007/s11227-021-03665-z |
Popis: | Recently more and more information sources are connected together and become a sort of complex graphs that can be exploited not only as a structured and semi-structured data such as rdb or xml, RDF or NoSQL, but also as many kinds of unstructured data such as web, bioinformatics, genometrics, patents, social media, knowlege graphs, IoT, hidden graph from deep learning results. State of the art studies have suggested methods of presenting data as hyper-graphs in search queries and finding search results in subgraphs or graph embeddings rather than a list of individual node results. We study the problem of retrieving top-k graph results with the query relevances; that is, given a set of query keywords Q on a graph G, we aim to find a subgraph g of G such that g is highly related to Q and closely linked under g. In order to consider the relevant graph results and the connectivity simultaneously, we present an effective algorithm graph threshold algorithm (GTA) based on a threshold algorithm (TA) which works efficiently in non-graph structure. We show that GTA does not need unnecessary searches under given objective functions, and prove the existence of an upper bound of the size of subgraph for top-k results, called $${\textit{hop}}_{{\textit{max}}}$$ , which makes it efficient to find the combined results. Finally, we conduct the performance studies on real and synthetic graphs, which demonstrate that our algorithm significantly outperforms conventional approaches with respect to time complexity and cost consumption. |
Databáze: | OpenAIRE |
Externí odkaz: |