Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

Autor: Yared Dejene Dessalk, Dumitru Roman, Akif Quddus Khan, Mihhail Matskin, Amir H. Payberah, Ahmet Soylu, Nikolay Nikolov
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Internet of Things
ISSN: 1010-1683
Popis: Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature. This work was partly funded by the EC H2020 projects ‘‘DataCloud: Enabling The Big Data Pipeline Lifecycle on the Computing Continuum’’ (Grant nr. 101016835) and ‘‘COGNITWIN: Cognitive plants through proactive self-learning hybrid digital twins’’ (Grant nr. 870130), and the NFR project ‘‘BigDataMine’’ (Grant nr. 309691).
Databáze: OpenAIRE