Výsledky vyhledávání - "Matsuoka, Satoshi"

Report

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt)

Autor: Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi

General Purpose Graphics Processing Units (GPGPU) are used in most of the top systems in HPC. The total capacity of scratchpad memory has increased by more than 40 times in the last decade. However, existing optimizations for stencil computations usi

Externí odkaz: http://arxiv.org/abs/2306.03336

Zobrazit plný text záznamu

Report

Revisiting Temporal Blocking Stencil Optimizations

Autor: Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the data local

Externí odkaz: http://arxiv.org/abs/2305.07390

Zobrazit plný text záznamu

Report

Myths and Legends in High-Performance Computing

Autor: Matsuoka, Satoshi, Domke, Jens, Wahib, Mohamed, Drozd, Aleksandr, Hoefler, Torsten

In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, paper

Externí odkaz: http://arxiv.org/abs/2301.02432

Zobrazit plný text záznamu

Report

Preparing for the Future -- Rethinking Proxy Apps

Autor: Matsuoka, Satoshi, Domke, Jens, Wahib, Mohamed, Drozd, Aleksandr, Bair, Ray, Chien, Andrew A., Vetter, Jeffrey S., Shalf, John

A considerable amount of research and engineering went into designing proxy applications, which represent common high-performance computing workloads, to co-design and evaluate the current generation of supercomputers, e.g., RIKEN's Supercomputer Fug

Externí odkaz: http://arxiv.org/abs/2204.07336

Zobrazit plný text záznamu

Report

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

Autor: Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier req

Externí odkaz: http://arxiv.org/abs/2204.02064

Zobrazit plný text záznamu

Report

At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

Autor: Domke, Jens, Vatai, Emil, Gerofi, Balazs, Kodama, Yuetsu, Wahib, Mohamed, Podobas, Artur, Mittal, Sparsh, Pericàs, Miquel, Zhang, Lingqi, Chen, Peng, Drozd, Aleksandr, Matsuoka, Satoshi

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate

Externí odkaz: http://arxiv.org/abs/2204.02235

Zobrazit plný text záznamu

Report

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of h

Externí odkaz: http://arxiv.org/abs/2110.11466

Zobrazit plný text záznamu

Report

Digital transformation of droplet/aerosol infection risk assessment realized on 'Fugaku' for the fight against COVID-19

Autor: Ando, Kazuto, Bale, Rahul, Li, ChungGang, Matsuoka, Satoshi, Onishi, Keiji, Tsubokura, Makoto

The fastest supercomputer in 2020, Fugaku, has not only achieved digital transformation of epidemiology in allowing end-to-end, detailed quantitative modeling of COVID-19 transmissions for the first time, but also transformed the behavior of the enti

Externí odkaz: http://arxiv.org/abs/2110.09769

Zobrazit plný text záznamu

Report

Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations

Autor: Chen, Peng, Wahib, Mohamed, Wang, Xiao, Takizawa, Shinichiro, Hirofuchi, Takahiro, Ogawa, Hirotaka, Matsuoka, Satoshi

Computed Tomography (CT) is a key 3D imaging technology that fundamentally relies on the compute-intense back-projection operation to generate 3D volumes. GPUs are typically used for back-projection in production CT devices. However, with the rise of

Externí odkaz: http://arxiv.org/abs/2104.13248

Zobrazit plný text záznamu

Report

Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?

Autor: Domke, Jens, Vatai, Emil, Drozd, Aleksandr, Chen, Peng, Oyama, Yosuke, Zhang, Lingqi, Salaria, Shweta, Mukunoki, Daichi, Podobas, Artur, Wahib, Mohamed, Matsuoka, Satoshi

Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced

Externí odkaz: http://arxiv.org/abs/2010.14373

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání