Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Mohammed Sourouri"'
Publikováno v:
ICPADS
We study the problem of contention for memory bandwidth between computation and communication in supercomputers that feature multicore CPUs. The problem arises when communication and computation are overlapped, and both operations compete for the sam
Autor:
Nico Reissmann, Mohammed Sourouri, Per Gunnar Kjeldsberg, Johannes Langguth, Espen Birger Raknes, Daniel Hackenberg, Robert Schöne
Publikováno v:
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on -SC '17
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on-SC 17
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
SC
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on-SC 17
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
SC
There is a consensus that exascale systems should operate within a power envelope of 20MW. Consequently, energy conservation is still considered as the most crucial constraint if such systems are to be realized. So far, most research on this topic ha
Publikováno v:
2016 International Conference on High Performance Computing & Simulation (HPCS)
Using large-scale multicore systems to get the maximum performance and energy efficiency with manageable programmability is a major challenge. The partitioned global address space (PGAS) programming model enhances programmability by providing a globa
Publikováno v:
ICCS
We present a novel method for 3D anisotropic front propagation and apply it to the simulation of geological folding. The new iterative algorithm has a simple structure and abundant parallelism, and is easily adapted to multithreaded architectures usi
Publikováno v:
CSE
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traf
Publikováno v:
Langguth, J; Sourouri, M; Lines, GT; Baden, SB; & Cai, X. (2015). Scalable Heterogeneous CPU-GPU Computations for Unstructured Tetrahedral Meshes. IEEE Micro, 35(4), 6-15. doi: 10.1109/MM.2015.70. UC San Diego: Retrieved from: http://www.escholarship.org/uc/item/70x7h2mk
IEEE Micro, vol 35, iss 4
IEEE Micro, vol 35, iss 4
© 1981-2012 IEEE. A recent trend in modern high-performance computing environments is the introduction of powerful, energy-efficient hardware accelerators such as GPUs and Xeon Phi coprocessors. These specialized computing devices coexist with CPUs
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8531ef43ed231247a933338f0167d767
http://www.escholarship.org/uc/item/70x7h2mk
http://www.escholarship.org/uc/item/70x7h2mk
Publikováno v:
ICPADS
In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simulta
Autor:
Jakub Kružík, Joseph Schuchart, Kai Diethelm, Umbreen Sabir Mian, Wolfgang E. Nagel, Anamika Chowdhury, Andreas Gocht, Martin Beseda, Zakaria Bendifallah, Michael Gerndt, Magnus Jahre, Michael Lysaght, Mohammed Sourouri, Lubomír Říha, Radim Sojka, Daniel Hackenberg, Per Gunnar Kjeldsberg, Othman Bouizi, Venkatesh Kannan, Madhura Kumaraswamy, David Horák
Publikováno v:
Computing
Energy efficiency is an important aspect of future exascale systems, mainly due to rising energy cost. Although High performance computing (HPC) applications are compute centric, they still exhibit varying computational characteristics in different r
Publikováno v:
Journal of Mathematics in Industry. 4(1):10
Two new algorithms for numerical solution of static Hamilton-Jacobi equations are presented. These algorithms are designed to work efficiently on different parallel computing architectures, and numerical results for multicore CPU and GPU implementati