D7.2.2 Exploitation of HPC Tools and Techniques

Autor: Lysaght, Michael, Lindi, Bjorn, Vondrak, Vit, Donners, John, Tajchman, Marc
Jazyk: angličtina
Rok vydání: 2014
Předmět:
DOI: 10.5281/zenodo.6575526
Popis: The objective of PRACE-3IP Work Package 7 (WP7) ‘Application Enabling and Support’ is to provide applications enabling support for HPC applications codes which are important for European researchers to ensure that these applications can effectively exploit multi-petaflop systems. This applications enabling activity uses the most promising tools, algorithms and standards for optimisation and parallel scaling that have recently been developed through research and experience in PRACE and other projects. In this deliverable, we report on the exploitation of new HPC tools and algorithms on different codes that are of interest to the European scientific and engineering research community. In this sense, the report here follows on naturally from the T7.2 deliverable D7.2.1, ‘A Report on the Survey of HPC Tools and Techniques’, which represented the first phase of activity in T7.2. Indeed, much of the exploitation work reported on here, was inspired by the comprehensive and in-depth analysis of state-of-the-art HPC tools and techniques as reported in D7.2.1. The report on the exploitation of state-of-the-art HPC tools and techniques presented here represents the second phase of activity in T7.2. In this report we summarise how selected state-of-the-art HPC tools and techniques fared on real-world applications during the exploitation phase of T7.2, where we focus on four separate topics that we have identified as being important to enable applications within WP7 on the road to exascale, and which mirror the four topics reported on in the survey of HPC Tools and Techniques in D7.2.1. These are: (1) Programming Models, (2) Scalable Libraries and Algorithms, (3) Debuggers and Profilers and finally, (4) I/O Management Techniques. For a more detailed description of each of the exploitation projects summarised here, we refer the reader to the PRACE-3IP whitepaper associated with each of the 17 projects. Programming Models During the second phase of T7.2, we have exploited several different programming models that were reported on in D7.2.1 as having genuine potential on the road to exascale. In this deliverable we provide summary reports on the effectiveness of each of these HPC tools when enabling real applications with future exascale challenges in mind. In particular we have focused on probing new (as well as under-exploited) features in mature programming models, such as the Message Passing Interface (MPI), the new features of which are now starting to confront the challenges of exascale computing. We have also exploited programming models targeting many-core architectures (where many-core typically implies > 50 cores), which are likely to continue to feature as part of future large-scale systems as we move into the deep petascale era. As pointed out in the first phase of T7.2, the entry of new competitors to the many-core space has increased the relevance of open standards on the road to exascale and we have therefore placed a particular focus on both mature and emerging open standards during the exploitation phase. In terms of more novel approaches to exploiting multi-petascale systems, we have also continued to be inspired by experimental programming models featuring in European exascale projects, which offer experimental task-based models for programming multi-/many-core architectures. We feel that it is worth also noting that, although possibilities for exploiting Partitioned Global Address Space (PGAS) languages on real applications were genuinely explored during the exploitation phase, no real opportunities arose for doing so, possibly reflecting the continuing challenge for exploiting these powerful tools on existing large-scale codes on the road to exascale. Scalable Libraries and Algorithms During the exploitation phase we have undertaken six separate enablement projects that have each focused on exploiting scalable libraries and algorithms. In terms of challenges on the road to exascale, global communications, in particular, are known to be a severe barrier when trying to scale across large core counts and many open questions still exist on how, for example, Fast Fourier Transform (FFT) libraries will perform on future exascale systems. With this challenge in mind, we are happy to report on the successful implementation of alternative methods to FFT libraries in a real molecular dynamics application, which has the potential to significantly improve scalability (and functionality) of the code on large node counts for certain problem types. As well as global communications, mesh generation and refinement have also been identified as posing major challenges on the road to exascale and as a result, we have also focused our efforts on both exploiting and improving state-of-the-art mesh tools for enabling Computational Fluid Dynamics (CFD) codes, which we report on here. We also report on the successful implementation of an Algebraic Multi-Grid (AMG) algorithm within a lattice Quantum Chromo-Dynamics (QCD) code, which has been shown to outperform existing techniques and shows real potential for enabling QCD applications on the road to exascale. Debuggers and Profilers In the survey of state-of-the-art HPC tools and techniques as reported in D7.2.1 we found that all of the European exascale projects are concentrating effort on tools for debugging and performance analyses. This is deemed a necessity for efficient use of multi-petascale and future exascale systems: If we are to enable applications on such systems, then we need to have as clear a view as possible of the barriers to achieving performance. At the same time, we noted in D7.2.2, that very little effort had gone into documenting the experience of using such tools on real applications within PRACE to date. Here, we try to rectify this by reporting on how state-of-the art profiling tools fared with respect to real large scale CFD and Computational Structural Mechanics (CSM) codes. We also report on how such tools can potentially be employed in combination with auto-tuning tools, which are becoming of increasing interest on the road to exascale. I/O Management Techniques During our surveying in the first phase of T7.2 and as reported on in D7.2.2, we found that users within PRACE have in general not been able to squeeze as much performance from existing parallel file systems as they have from computational hardware, particularly for the case of high-level I/O libraries. With this challenge in mind, we have carried out deeper investigations into extracting performance from file systems using state-of-the-art high-level libraries, work that we are happy to report has improved the I/O performance of an astrophysics application on Tier-0 systems and shows real promise on the road to exascale.
Databáze: OpenAIRE