iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems
Autor: | Feiyi Wang, Bharti Wadhwa, Kirk W. Cameron, Jon Bernard, Ali R. Butt, Sarah Neuwirth, Sarp Oral, Arnab K. Paul |
---|---|
Rok vydání: | 2019 |
Předmět: |
020203 distributed computing
Computer science business.industry Server Distributed data store 0202 electrical engineering electronic engineering information engineering Resource contention Lustre (file system) 02 engineering and technology Load balancing (computing) business Supercomputer Computer network |
Zdroj: | IPDPS |
DOI: | 10.1109/ipdps.2019.00070 |
Popis: | Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing systemwide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an "end-to-end control plane" where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages realtime load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art. |
Databáze: | OpenAIRE |
Externí odkaz: |