In-database connected component analysis
Autor: | Bögeholz, Harald, Brand, Michael, Todor, Radu-Alexandru |
---|---|
Rok vydání: | 2018 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any $\epsilon>0$ and any input graph $G=\langle V, E \rangle$ the algorithm terminates after $\mathop{\text{O}}(\log |V|)$ SQL queries with probability of at least $1-\epsilon$, which we show empirically to translate to a quasi-linear runtime in practice. Comment: major revision with new datasets |
Databáze: | arXiv |
Externí odkaz: |