Declarative Big Data Analysis for High-Energy Physics: TOTEM Use Case

Autor: Prasanth Kothuri, Enrico Bocchi, Maciej Malawski, Danilo Piparo, Jan Kaspar, Jakub Moscicki, Leszek Grzanka, Valentina Avati, Enrico Guiraud, Milosz Blaszkiewicz, Massimo Lamanna, Luca Canali, Javier Cervantes, Aleksandra Mnich, Shravan Murali, Diogo Castro, Enric Tejedor
Rok vydání: 2019
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783030293994
Euro-Par
DOI: 10.1007/978-3-030-29400-7_18
Popis: The High-Energy Physics community faces new data processing challenges caused by the expected growth of data resulting from the upgrade of LHC accelerator. These challenges drive the demand for exploring new approaches for data analysis. In this paper, we present a new declarative programming model extending the popular ROOT data analysis framework, and its distributed processing capability based on Apache Spark. The developed framework enables high-level operations on the data, known from other big data toolkits, while preserving compatibility with existing HEP data files and software. In our experiments with a real analysis of TOTEM experiment data, we evaluate the scalability of this approach and its prospects for interactive processing of such large data sets. Moreover, we show that the analysis code developed with the new model is portable between a production cluster at CERN and an external cluster hosted in the Helix Nebula Science Cloud thanks to the bundle of services of Science Box.
Databáze: OpenAIRE