Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Lukasz Wesolowski"'
Autor:
Denis Sheahan, Janet Yang, Lei Tian, Valentin Andrei, Bilge Acun, Cyril Meurillon, Gisle Dankel, Peifeng Yu, Adnan Aziz, Christopher Gregg, Lukasz Wesolowski, Kim Hazelwood, Xiaoqiao Meng
Publikováno v:
IEEE Micro. 41:101-112
In this article, we present a system to collectively optimize efficiency in a very large scale deployment of GPU servers for machine learning workloads at Facebook. Our system 1) measures and stores system-wide efficiency metrics for every executed w
Publikováno v:
The International Journal of High Performance Computing Applications. 24:411-427
The emergence of new parallel architectures presents new challenges for application developers. Supercomputers vary in processor speed, network topology, interconnect communication characteristics and memory subsystems. This paper presents a performa
Autor:
Nikhil Jain, Abhishek Gupta, Ehsan Totoni, Laxmikant V. Kale, Eric Mikida, Michael P. Robson, Yanhua Sun, Akhil Langer, Lukasz Wesolowski, Bilge Acun, Harshitha Menon, Xiang Ni
Publikováno v:
SC
The advent of petascale computing has introduced new challenges (e.g. heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exace
Autor:
Fabio Governato, Pritish Jetley, Thomas R. Quinn, Laxmikant V. Kale, Lukasz Wesolowski, Gengbin Zheng, Harshitha Menon
ChaNGa is an N-body cosmology simulation application implemented using Charm++. In this paper, we present the parallel design of ChaNGa and address many challenges arising due to the high dynamic ranges of clustered datasets. We focus on optimization
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::27bb6cb2bbf34b39fa6da7bc1aa01c4e
http://arxiv.org/abs/1409.1929
http://arxiv.org/abs/1409.1929
Autor:
Thomas R. Quinn, Ramprasad Venkataraman, Lukasz Wesolowski, Laxmikant V. Kale, Pritish Jetley, Yanhua Sun, Jae-Seung Yeom, Keith R. Bisset, Abhishek Gupta
Publikováno v:
ICPP
Fine-grained communication in supercomputing applications often limits performance through high communication overhead and poor utilization of network bandwidth. This paper presents Topological Routing and Aggregation Module (TRAM), a library that op
Autor:
Martin Schulz, Dimitrios S. Nikolopoulos, Abhinav Bhatele, Madhav V. Marathe, Abhishek Gupta, Laxmikant V. Kale, Eric Bohm, Jae-Seung Yeom, Keith R. Bisset, Lukasz Wesolowski
Publikováno v:
Yeom, J, Bhatele, A, Bisset, K, Bohm, E, Gupta, A, Kale, L V, Marathe, M, Nikolopoulos, D S, Schulz, M & Wesolowski, L 2014, Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters . in 2014 IEEE 28th International Parallel and Distributed Processing Symposium . Institute of Electrical and Electronics Engineers Inc., Washington, DC, USA, pp. 755-764 . https://doi.org/10.1109/IPDPS.2014.83
IPDPS
IPDPS
Modeling dynamical systems represents an important application class covering a wide range of disciplines including but not limited to biology, chemistry, finance, national security, and health care. Such applications typically involve large-scale, i
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::75f67f0cfa00d7a416aea772f91fede3
https://pure.qub.ac.uk/en/publications/f14c6267-90a8-4bd9-a122-3f659e621a6d
https://pure.qub.ac.uk/en/publications/f14c6267-90a8-4bd9-a122-3f659e621a6d
Autor:
Pritish Jetley, William Gropp, Abhinav Bhatele, Hormozd Gahvari, Laxmikant V. Kale, Lukasz Wesolowski
Publikováno v:
IPDPS
The first Teraflop/s computer, the ASCI Red, became operational in 1997, and it took more than 11 years for a Petaflop/s performance machine, the IBM Roadrunner, to appear on the Top500 list. Efforts have begun to study the hardware and software chal
Publikováno v:
SC
This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the contex