Data structure set-trie for storing and querying sets: Theoretical and empirical analysis.
Autor: | Savnik I; Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia.; Faculty of Information Studies, Novo Mesto, Slovenia., Akulich M; Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia., Krnc M; Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia.; Faculty of Information Studies, Novo Mesto, Slovenia.; Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia., Škrekovski R; Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia.; Faculty of Information Studies, Novo Mesto, Slovenia.; Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia. |
---|---|
Jazyk: | angličtina |
Zdroj: | PloS one [PLoS One] 2021 Feb 10; Vol. 16 (2), pp. e0245122. Date of Electronic Publication: 2021 Feb 10 (Print Publication: 2021). |
DOI: | 10.1371/journal.pone.0245122 |
Abstrakt: | Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set containment operations. We present the mathematical and empirical study of the set-trie. In the mathematical study, the relevant upper-bounds on the efficiency of its expected performance are established by utilizing a natural probabilistic model. In the empirical study, we give insight into how different distributions of input data impact the efficiency of set-trie. Using the correct parameters for those randomly generated datasets, we expose the key sources of the input sensitivity of set-trie. Finally, the empirical comparison of set-trie with the inverted index is based on the real-world datasets containing sets of low cardinality. The comparison shows that the running time of set-trie consistently outperforms the inverted index by orders of magnitude. Competing Interests: The authors have declared that no competing interests exist. |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |