Zobrazeno 1 - 10
of 705
pro vyhledávání: '"Pericàs, Miquel A"'
The RISC-V "V" extension introduces vector processing to the RISC-V architecture. Unlike most SIMD extensions, it supports long vectors which can result in significant improvement of multiple applications. In this paper, we present our ongoing resear
Externí odkaz:
http://arxiv.org/abs/2311.05284
In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance analysis of O
Externí odkaz:
http://arxiv.org/abs/2311.05267
Energy-efficient execution of task-based parallel applications is crucial as tasking is a widely supported feature in many parallel programming libraries and runtimes. Currently, state-of-the-art proposals primarily rely on leveraging core asymmetry
Externí odkaz:
http://arxiv.org/abs/2306.04615
As an increasing number of businesses becomes powered by machine-learning, inference becomes a core operation, with a growing trend to be offered as a service. In this context, the inference task must meet certain service-level objectives (SLOs), suc
Externí odkaz:
http://arxiv.org/abs/2306.01679
CPU-based inference can be an alternative to off-chip accelerators, and vector architectures are a promising option due to their efficiency. However, the large design space of convolutional algorithms and hardware implementations makes it challenging
Externí odkaz:
http://arxiv.org/abs/2212.11574
OpenMP is the de facto API for parallel programming in HPC applications. These programs are often computed in data centers, where energy consumption is a major issue. Whereas previous work has focused almost entirely on performance, we here analyse a
Externí odkaz:
http://arxiv.org/abs/2209.04317
Publikováno v:
In Journal of Catalysis December 2024 440
Autor:
Domke, Jens, Vatai, Emil, Gerofi, Balazs, Kodama, Yuetsu, Wahib, Mohamed, Podobas, Artur, Mittal, Sparsh, Pericàs, Miquel, Zhang, Lingqi, Chen, Peng, Drozd, Aleksandr, Matsuoka, Satoshi
Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate
Externí odkaz:
http://arxiv.org/abs/2204.02235
Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and
Externí odkaz:
http://arxiv.org/abs/2202.11575
Parallel applications often rely on work stealing schedulers in combination with fine-grained tasking to achieve high performance and scalability. However, reducing the total energy consumption in the context of work stealing runtimes is still challe
Externí odkaz:
http://arxiv.org/abs/2201.12186