Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers
Autor: | Yared Dejene Dessalk, Dumitru Roman, Akif Quddus Khan, Mihhail Matskin, Amir H. Payberah, Ahmet Soylu, Nikolay Nikolov |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Big Data
Domain-specific language Computer science Big data Internet of Things Cloud computing Artificial Intelligence Datateknologi: 551 [VDP] Management of Technology and Innovation Computer Science (miscellaneous) Engineering (miscellaneous) Big Data workflows business.industry Computer Science Applications Computer technology: 551 [VDP] Workflow Elasticity (cloud computing) Hardware and Architecture Middleware Container (abstract data type) Scalability Software containers Big data workflows Software engineering business Domain-specific languages Software Information Systems |
Zdroj: | Internet of Things |
ISSN: | 1010-1683 |
Popis: | Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature. This work was partly funded by the EC H2020 projects ‘‘DataCloud: Enabling The Big Data Pipeline Lifecycle on the Computing Continuum’’ (Grant nr. 101016835) and ‘‘COGNITWIN: Cognitive plants through proactive self-learning hybrid digital twins’’ (Grant nr. 870130), and the NFR project ‘‘BigDataMine’’ (Grant nr. 309691). |
Databáze: | OpenAIRE |
Externí odkaz: |