Big Data Workflows: Locality-Aware Orchestration Using Software Containers
Autor: | Corodescu, Andrei-Alin, Nikolov, Nikolay, Khan, Akif Quddus, Soylu, Ahmet, Matskin, Mihhail, Payberah, Amir H., Roman, Dumitru |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Big Data
Information Storage and Retrieval 02 engineering and technology TP1-1185 Biochemistry Article Analytical Chemistry Workflow 020204 information systems 0202 electrical engineering electronic engineering information engineering orchestration Electrical and Electronic Engineering Instrumentation big data workflows data locality software containers Orchestrations Chemical technology Orchestration Computational Biology 020206 networking & telecommunications Atomic and Molecular Physics and Optics Software containers Data locality Big data workflows Software |
Zdroj: | Sensors, Vol 21, Iss 8212, p 8212 (2021) 21:8212 Sensors Sensors (Basel, Switzerland) Sensors; Volume 21; Issue 24; Pages: 8212 |
ISSN: | 1424-8220 1010-1683 |
Popis: | The emergence of the Edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing Big Data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the Edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric Big Data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo Workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution. The work in this paper was partly funded by the EC H2020 project “DataCloud ” (grant number 101016835) and the NFR project “BigDataMine” (grant number 309691). |
Databáze: | OpenAIRE |
Externí odkaz: |