Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales
Autor: | Xiaobing Zhou, Ke Wang, Tonglin Li, Ioan Raicu, Michael Lang, Iman Sadooghi, Kan Qiao |
---|---|
Rok vydání: | 2015 |
Předmět: |
020203 distributed computing
Computer Networks and Communications Computer science Distributed computing 020207 software engineering 02 engineering and technology Dynamic priority scheduling Load balancing (computing) Round-robin scheduling Fair-share scheduling Computer Science Applications Theoretical Computer Science Scheduling (computing) Fixed-priority pre-emptive scheduling Computational Theory and Mathematics Two-level scheduling Scalability 0202 electrical engineering electronic engineering information engineering Data-intensive computing Queue Software |
Zdroj: | Concurrency and Computation: Practice and Experience. 28:70-94 |
ISSN: | 1532-0634 1532-0626 |
DOI: | 10.1002/cpe.3617 |
Popis: | Data-driven programming models such as many-task computing MTC have been prevalent for running data-intensive scientific applications. MTC applies over-decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data-intensive applications. Our previous research proposed a data-aware work-stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key-value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd. |
Databáze: | OpenAIRE |
Externí odkaz: |