Modeling Big Data Processing Programs
Autor: | Martin A. Musicante, Genoveva Vargas-Solar, Anamaria Martins Moreira, João Batista de Souza Neto |
---|---|
Přispěvatelé: | Vargas-Solar, Genoveva, Base de Données (BD), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Big data processing
Monoid Data processing [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] Programming language Computer science 020207 software engineering 02 engineering and technology Petri net computer.software_genre Directed acyclic graph Data flow diagram Spark (mathematics) 0202 electrical engineering electronic engineering information engineering Big Data processing Join (sigma algebra) [INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB] 020201 artificial intelligence & image processing Petri Nets Data flow programming models computer Monoid Algebra |
Zdroj: | 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS, Nov 2020, Ouro Preto, Brazil Lecture Notes in Computer Science ISBN: 9783030638818 SBMF |
Popis: | We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely, operations over data (filtering, aggregation, join) and program execution, defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data flow. This approach allows the data processing program specification to be agnostic of the target Big Data processing system. As a first application of the model, we used it to formalize mutation operators for the application of mutation testing in Big Data processing programs. The testing tool TRANSMUT-Spark implement these operators. |
Databáze: | OpenAIRE |
Externí odkaz: |