Dask Tutorial

Autor: Doug Davis
Rok vydání: 2022
DOI: 10.5281/zenodo.7140015
Popis: Dask provides a foundation to natively scale Python libraries and applications. Dask collection libraries likedask.arrayanddask.dataframemimic the ubiquitous APIs of NumPy and Pandas to parallelize and/or distribute NumPy-like and Pandas-like workflows. Thedask.delayedcollection supports parallalization of custom algorithms. In this tutorial we will introduce the core Dask collections, the concepts behind them (partitioned objects represented by task graphs), and Dask's distributed execution engine that is compatible with common HEP batch compute systems. Finally, we will introduce recently developed Dask collections that support partitioned and distributed representations ofawkwardarrays andboost-histogramobjects.
Databáze: OpenAIRE