MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows
Autor: | Li Tang, Matthieu Dorier, Tong Shu, Matthew Wolf, Justin M. Wozniak, Robert Ross, Norbert Podhorszki, Tahsin Kurc |
---|---|
Rok vydání: | 2019 |
Předmět: |
Flexibility (engineering)
Multi-core processor Computer Networks and Communications Computer science Process (engineering) Interoperability 020206 networking & telecommunications Fault tolerance 02 engineering and technology computer.software_genre Workflow Hardware and Architecture 0202 electrical engineering electronic engineering information engineering Operating system 020201 artificial intelligence & image processing Throughput (business) computer Software Workflow management system |
Zdroj: | Future Generation Computer Systems. 101:576-589 |
ISSN: | 0167-739X |
Popis: | While the use of workflows for HPC is growing, MPI interoperability remains a challenge for workflow management systems. The MPI standard and/or its implementations provide a number of ways to build multiple-programs-multiple-data (MPMD) applications. These methods present limitations related to fault tolerance, and are not easy to use. In this paper, we advocate for a novel MPI_Comm_launch function acting as the parallel counterpart of a system(3) call. MPI_Comm_launch allows a child MPI application to be launched inside the resources originally held by processes of a parent MPI application. Two important aspects of MPI_Comm_launch is that it pauses the calling process, and runs the child processes on the parent’s CPU cores, but in an isolated manner with respect to memory. This function makes it easier to build MPMD applications with well-decoupled subtasks. We show how this feature can provide better flexibility and better fault tolerance in ensemble simulations and HPC workflows. We report results showing 2 × throughput improvement for application workflows with faults, and scaling results for challenging workloads up to 256 nodes. |
Databáze: | OpenAIRE |
Externí odkaz: |