A Data Placement Strategy for Distributed Document-oriented Data Warehouse.

Autor: Khalil, Abdelhak, Belaissaoui, Mustapha, Toufik, Fouad
Předmět:
Zdroj: IAENG International Journal of Computer Science; Dec2023, Vol. 50 Issue 4, p1541-1549, 9p
Abstrakt: Within the big data phenomenon, cluster computing has attracted special attention for its impressive ability to process a vast amount of data. Hadoop cluster is a promising cluster computing framework for implementing big data warehouses and conducting big data analysis, thanks to its distributed file system and MapReduce paradigm. In this paper, we propose a new data placement strategy for a document-oriented data warehouse within the distributed environment of Hadoop. Our contribution includes formalizing the logical model and cube building operators. First, we present the cube building algorithm's processing using the MapReduce paradigm, and then we explore the possibility of accelerating the process by using Spark instead. To evaluate the proposed framework in terms of OLAP cube construction cost, we conducted experiments on a physical cluster, which yielded promising results, specifically, that the proposed framework enables efficient data placement and significantly speeds up cube building compared to a similar OLAP infrastructure chosen from existing literature. [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index