Zobrazeno 1 - 10
of 178
pro vyhledávání: '"C.1.2"'
Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging in itself
Externí odkaz:
http://arxiv.org/abs/2410.17563
Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device's compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMM
Externí odkaz:
http://arxiv.org/abs/2409.02227
Wireless Network-on-Chip (WNoC) is a promising concept which provides a solution to overcome the scalability issues in prevailing networks-in-package for many-core processors. However, the electromagnetic propagation inside the chip package leads to
Externí odkaz:
http://arxiv.org/abs/2408.07421
Autor:
Dabak, Beyza, Glenn, Major, Liu, Jingyang, Buck, Alexander, Yang, Siyi, Calderbank, Robert, Jerger, Natalie Enright, Sorin, Daniel J.
Energy is a primary constraint in processor design, and much of that energy is consumed in on-chip communication. Communication can be intra-core (e.g., from a register file to an ALU) or inter-core (e.g., over the on-chip network). In this paper, we
Externí odkaz:
http://arxiv.org/abs/2405.14783
Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed t
Externí odkaz:
http://arxiv.org/abs/2401.16677
Intel processors utilize the retirement to orderly retire the micro-ops that have been executed out of order. To enhance retirement utilization, the retirement is dynamically shared between two logical cores on the same physical core. However, this s
Externí odkaz:
http://arxiv.org/abs/2307.12486
We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the Sp
Externí odkaz:
http://arxiv.org/abs/2307.03276
Autor:
Valassi, Andrea, Childers, Taylor, Field, Laurence, Hageböck, Stephan, Hopkins, Walter, Mattelaer, Olivier, Nichols, Nathan, Roiser, Stefan, Smith, David, Teig, Jorgen, Vuosalo, Carl, Wettersten, Zenny
The matrix element (ME) calculation in any Monte Carlo physics event generator is an ideal fit for implementing data parallelism with lockstep processing on GPUs and vector CPUs. For complex physics processes where the ME calculation is the computati
Externí odkaz:
http://arxiv.org/abs/2303.18244
Autor:
Gesek, George
Publikováno v:
2021 IEEE International Conference on Web Services (ICWS), Chicago, IL, USA, 2021, pp. 32-41
Quantum Computers, one fully realized, can represent an exponential boost in computing power. However, the computational power of the current quantum computers, referred to as Noisy Internediate Scale Quantum, or NISQ, is severely limited because of
Externí odkaz:
http://arxiv.org/abs/2302.12750
Power is increasingly becoming a limiting resource in high-performance, GPU-accelerated computing systems. Understanding the range and sources of power variation is essential in setting realistic bounds on rack and system peak power, and developing t
Externí odkaz:
http://arxiv.org/abs/2212.08805