Modeling Big Data Processing Programs

Autor: Martin A. Musicante, Genoveva Vargas-Solar, Anamaria Martins Moreira, João Batista de Souza Neto
Přispěvatelé: Vargas-Solar, Genoveva, Base de Données (BD), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2)
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS
23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS, Nov 2020, Ouro Preto, Brazil
Lecture Notes in Computer Science ISBN: 9783030638818
SBMF
Popis: We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely, operations over data (filtering, aggregation, join) and program execution, defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data flow. This approach allows the data processing program specification to be agnostic of the target Big Data processing system. As a first application of the model, we used it to formalize mutation operators for the application of mutation testing in Big Data processing programs. The testing tool TRANSMUT-Spark implement these operators.
Databáze: OpenAIRE