Výsledky vyhledávání

Report

Efficiently Scheduling Parallel DAG Tasks on Identical Multiprocessors

Autor: Lendve, Shardul, Bletsas, Konstantinos, Souto, Pedro F.

Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging in itself

Externí odkaz: http://arxiv.org/abs/2410.17563

Zobrazit plný text záznamu

Report

Global Optimizations & Lightweight Dynamic Logic for Concurrency

Autor: Pati, Suchita, Aga, Shaizeen, Jayasena, Nuwan, Sinclair, Matthew D.

Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device's compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMM

Externí odkaz: http://arxiv.org/abs/2409.02227

Zobrazit plný text záznamu

Report

A MAC Protocol with Time Reversal for Wireless Networks within Computing Packages

Autor: Bandara, Ama, Das, Abhijit, Rodríguez-Galán, Fátima, Alarcón, Eduard, Abadal, Sergi

Wireless Network-on-Chip (WNoC) is a promising concept which provides a solution to overcome the scalability issues in prevailing networks-in-package for many-core processors. However, the electromagnetic propagation inside the chip package leads to

Externí odkaz: http://arxiv.org/abs/2408.07421

Zobrazit plný text záznamu

Report

Low-Energy Line Codes for On-Chip Networks

Autor: Dabak, Beyza, Glenn, Major, Liu, Jingyang, Buck, Alexander, Yang, Siyi, Calderbank, Robert, Jerger, Natalie Enright, Sorin, Daniel J.

Energy is a primary constraint in processor design, and much of that energy is consumed in on-chip communication. Communication can be intra-core (e.g., from a register file to an ALU) or inter-core (e.g., over the on-chip network). In this paper, we

Externí odkaz: http://arxiv.org/abs/2405.14783

Zobrazit plný text záznamu

Report

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

Autor: Pati, Suchita, Aga, Shaizeen, Islam, Mahzabeen, Jayasena, Nuwan, Sinclair, Matthew D.

Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed t

Externí odkaz: http://arxiv.org/abs/2401.16677

Zobrazit plný text záznamu

Report

New Covert and Side Channels Based on Retirement

Autor: Xu, Ke, Tang, Ming, Wang, Quancheng, Wang, Han

Intel processors utilize the retirement to orderly retire the micro-ops that have been executed out of order. To enhance retirement utilization, the retirement is dynamically shared between two logical cores on the same physical core. However, this s

Externí odkaz: http://arxiv.org/abs/2307.12486

Zobrazit plný text záznamu

Report

Analyzing the Performance Portability of Tensor Decomposition

Autor: Anderson, S. Isaac Geronimo, Teranishi, Keita, Dunlavy, Daniel M., Choi, Jee

We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the Sp

Externí odkaz: http://arxiv.org/abs/2307.03276

Zobrazit plný text záznamu

Report

Speeding up Madgraph5 aMC@NLO through CPU vectorization and GPU offloading: towards a first alpha release

Autor: Valassi, Andrea, Childers, Taylor, Field, Laurence, Hageböck, Stephan, Hopkins, Walter, Mattelaer, Olivier, Nichols, Nathan, Roiser, Stefan, Smith, David, Teig, Jorgen, Vuosalo, Carl, Wettersten, Zenny

The matrix element (ME) calculation in any Monte Carlo physics event generator is an ideal fit for implementing data parallelism with lockstep processing on GPUs and vector CPUs. For complex physics processes where the ME calculation is the computati

Externí odkaz: http://arxiv.org/abs/2303.18244

Zobrazit plný text záznamu

Report

A Uniform Quantum Computing Model Based on Virtual Quantum Processors

Autor: Gesek, George

Publikováno v: 2021 IEEE International Conference on Web Services (ICWS), Chicago, IL, USA, 2021, pp. 32-41

Quantum Computers, one fully realized, can represent an exponential boost in computing power. However, the computational power of the current quantum computers, referred to as Noisy Internediate Scale Quantum, or NISQ, is severely limited because of

Externí odkaz: http://arxiv.org/abs/2302.12750

Zobrazit plný text záznamu

Report

Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power

Autor: Bhalachandra, Sridutt, Austin, Brian, Williams, Samuel, Wright, Nicholas J.

Power is increasingly becoming a limiting resource in high-performance, GPU-accelerated computing systems. Understanding the range and sources of power variation is essential in setting realistic bounds on rack and system peak power, and developing t

Externí odkaz: http://arxiv.org/abs/2212.08805

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání