Zobrazeno 1 - 10
of 1 333
pro vyhledávání: '"Matsuoka, Satoshi"'
Autor:
Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi
General Purpose Graphics Processing Units (GPGPU) are used in most of the top systems in HPC. The total capacity of scratchpad memory has increased by more than 40 times in the last decade. However, existing optimizations for stencil computations usi
Externí odkaz:
http://arxiv.org/abs/2306.03336
Autor:
Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi
Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the data local
Externí odkaz:
http://arxiv.org/abs/2305.07390
In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, paper
Externí odkaz:
http://arxiv.org/abs/2301.02432
Autor:
Matsuoka, Satoshi, Domke, Jens, Wahib, Mohamed, Drozd, Aleksandr, Bair, Ray, Chien, Andrew A., Vetter, Jeffrey S., Shalf, John
A considerable amount of research and engineering went into designing proxy applications, which represent common high-performance computing workloads, to co-design and evaluate the current generation of supercomputers, e.g., RIKEN's Supercomputer Fug
Externí odkaz:
http://arxiv.org/abs/2204.07336
Autor:
Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi
Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier req
Externí odkaz:
http://arxiv.org/abs/2204.02064
Autor:
Domke, Jens, Vatai, Emil, Gerofi, Balazs, Kodama, Yuetsu, Wahib, Mohamed, Podobas, Artur, Mittal, Sparsh, Pericàs, Miquel, Zhang, Lingqi, Chen, Peng, Drozd, Aleksandr, Matsuoka, Satoshi
Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate
Externí odkaz:
http://arxiv.org/abs/2204.02235
Autor:
Farrell, Steven, Emani, Murali, Balma, Jacob, Drescher, Lukas, Drozd, Aleksandr, Fink, Andreas, Fox, Geoffrey, Kanter, David, Kurth, Thorsten, Mattson, Peter, Mu, Dawei, Ruhela, Amit, Sato, Kento, Shirahata, Koichi, Tabaru, Tsuguchika, Tsaris, Aristeidis, Balewski, Jan, Cumming, Ben, Danjo, Takumi, Domke, Jens, Fukai, Takaaki, Fukumoto, Naoto, Fukushi, Tatsuya, Gerofi, Balazs, Honda, Takumi, Imamura, Toshiyuki, Kasagi, Akihiko, Kawakami, Kentaro, Kudo, Shuhei, Kuroda, Akiyoshi, Martinasso, Maxime, Matsuoka, Satoshi, Mendonça, Henrique, Minami, Kazuki, Ram, Prabhat, Sawada, Takashi, Shankar, Mallikarjun, John, Tom St., Tabuchi, Akihiro, Vishwanath, Venkatram, Wahib, Mohamed, Yamazaki, Masafumi, Yin, Junqi
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of h
Externí odkaz:
http://arxiv.org/abs/2110.11466
Autor:
Ando, Kazuto, Bale, Rahul, Li, ChungGang, Matsuoka, Satoshi, Onishi, Keiji, Tsubokura, Makoto
The fastest supercomputer in 2020, Fugaku, has not only achieved digital transformation of epidemiology in allowing end-to-end, detailed quantitative modeling of COVID-19 transmissions for the first time, but also transformed the behavior of the enti
Externí odkaz:
http://arxiv.org/abs/2110.09769
Autor:
Chen, Peng, Wahib, Mohamed, Wang, Xiao, Takizawa, Shinichiro, Hirofuchi, Takahiro, Ogawa, Hirotaka, Matsuoka, Satoshi
Computed Tomography (CT) is a key 3D imaging technology that fundamentally relies on the compute-intense back-projection operation to generate 3D volumes. GPUs are typically used for back-projection in production CT devices. However, with the rise of
Externí odkaz:
http://arxiv.org/abs/2104.13248
Autor:
Domke, Jens, Vatai, Emil, Drozd, Aleksandr, Chen, Peng, Oyama, Yosuke, Zhang, Lingqi, Salaria, Shweta, Mukunoki, Daichi, Podobas, Artur, Wahib, Mohamed, Matsuoka, Satoshi
Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced
Externí odkaz:
http://arxiv.org/abs/2010.14373