The evolution of data storage architectures: examining the secure value of the Data Lakehouse

Autor: Janssen, Nathalie, Ilayperuma, Tharaka, Jayasinghe, Jeewanie, Bukhsh, Faiza, Daneva, Maya
Zdroj: Journal of Data, Information and Management; 20240101, Issue: Preprints p1-26, 26p
Abstrakt: The digital shift in society is making continuous growth of data. However, choosing a suitable storage architecture to efficiently store, process, and manage data from numerous sources remains a challenge. Currently, there are three storage architecture generations in practice, and the most recent one is Data Lakehouse. Given its novelty, limited research has been done into the rationale behind its introduction, strengths, and weaknesses. In order to fill this gap, this study aims to investigate the secure value (comparative strengths) of the data lakehouse architecture compared to data warehouse and data lake architectures. After conducting a comprehensive systematic literature review, we propose a data storage evolution model showing the comparative strengths and weaknesses of data warehouse, lake, and lakehouse architectures. With the use of the proposed model and expert interviews, this study demonstrates the secure value of the data lakehouse compared to the preceding architectures. In addition, the study presents a high-level view of the overlapping strengths of data Lakehouse with both data warehouse and data lake. In essence, the artifact produced by this study can be used to explain the rationale behind the evolution of data storage architectures. Further, the proposed model will help the practitioners in studying the trade-off between different architectures to offer recommendations. Finally, authors acknowledge that this study has several limitations, such as the limited sample size for the interviews and the bias due to the use of qualitative research approach. However, all the available measures were taken to minimize the effects of these limitations.
Databáze: Supplemental Index