Performance Evaluation of a Data Lake Architecture via Modeling Techniques

Autor: Letizia Tanca, Giuseppe Serazzi, Marco Gribaudo, Enrico Barbierato
Rok vydání: 2021
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783030918248
DOI: 10.1007/978-3-030-91825-5_7
Popis: Data Lake is a term denoting a repository storing heterogeneous data, both structured and unstructured, resulting in a flexible organization that allows Data Lake users to reorganize and integrate dynamically the information they need according to the required query or analysis. The success of its implementation depends on many factors, notably the distributed storage, the kind of media deployed, the data access protocols and the network used. However, flaws in the design might become evident only in a later phase of the system development, causing significant delays in complex projects. This article presents an application of queuing networks modeling technique to detect significant issues, such as bottlenecks and performance degradation, for different workload scenarios.
Databáze: OpenAIRE