Data partition optimisation for column-family NoSQL databases

Autor: Li Yung Ho, Meng Ju Hsieh, Pangfeng Liu, Jan Jan Wu
Rok vydání: 2017
Předmět:
Zdroj: International Journal of Big Data Intelligence. 4:263
ISSN: 2053-1397
2053-1389
DOI: 10.1504/ijbdi.2017.10006848
Popis: Data conversion has become an emerging topic in BigData era. To face the challenge of rapid data growth, legacy or existing relational databases have the need to convert into NoSQL column-family database in order to achieve better scalability. The conversion from SQL to NoSQL databases requires combining small, normalised SQL data tables into larger NoSQL data tables; a process called denormalisation. A challenging issue in data conversion is how to group the denormalised columns in a large data table into 'families' in order to ensure the performance of query processing. In this paper, we propose an efficient heuristic algorithm, graph-based partition algorithm (GPA), to address this problem. We use TPC-C and TPC-H benchmarks to demonstrate that the column-families produced by GPA is very efficient for large-scale data processing.
Databáze: OpenAIRE