BinLPT: A Novel Workload-Aware Loop Scheduler for Irregular Parallel Loops

Autor: Patricia Della Mea Plentz, Márcio Castro, Pedro Henrique Penna, Henrique Freitas, François Broquedis, Jean-François Méhaut
Přispěvatelé: Universidade Federal de Santa Catarina = Federal University of Santa Catarina [Florianópolis] (UFSC), Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Laboratoire d'Informatique de Grenoble (LIG ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Compiler Optimization and Run-time Systems (CORSE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Pontifícia Universidade Católica de Minas Gerais (PUC Minas), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Penna, Pedro Henrique
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: Simpósio em Sistemas Computacionais de Alto Desempenho
Simpósio em Sistemas Computacionais de Alto Desempenho, Oct 2017, Campinas, Brazil
Popis: National audience; Workload-aware loop schedulers were introduced to deliver better performance than classical strategies, but they present limitations on work-load estimation, chunk scheduling and integrability with applications. Targeting these challenges, in this work we propose a novel workload-aware loop sched-uler that is called BinLPT and it is based on three features. First, it relies on some user-supplied estimation of the workload of the target parallel loop. Second , BinLPT uses a greedy bin packing heuristic to adaptively partition the iteration space in several chunks. The maximum number of chunks to be produced is a parameter that may be fine-tuned. Third, it schedules chunks of iterations using a hybrid scheme based on the LPT rule and on-demand scheduling. We integrated BinLPT in OpenMP, and we evaluated its performance in a large-scale NUMA machine using a synthetic kernel and 3D N-Body Simulations. Our results revealed that BinLPT improves performance over OpenMP's strategies by up to 45.13% and 37.15% in the synthetic and application kernels, respectively.
Databáze: OpenAIRE