Výsledky vyhledávání - "Mowry, Todd C."

Report

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Autor: Fegade, Pratik, Chen, Tianqi, Gibbons, Phillip B., Mowry, Todd C.

Dynamic control flow is an important technique often used to design expressive and efficient deep learning computations for applications such as text parsing, machine translation, exiting early out of deep models and so on. The control flow divergenc

Externí odkaz: http://arxiv.org/abs/2305.10611

Zobrazit plný text záznamu

Report

ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks via Learned Finite State Machines

Autor: Chen, Siyuan, Fegade, Pratik, Chen, Tianqi, Gibbons, Phillip B., Mowry, Todd C.

Batching has a fundamental influence on the efficiency of deep neural network (DNN) execution. However, for dynamic DNNs, efficient batching is particularly challenging as the dataflow graph varies per input instance. As a result, state-of-the-art fr

Externí odkaz: http://arxiv.org/abs/2302.03851

Zobrazit plný text záznamu

Report

The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

Autor: Fegade, Pratik, Chen, Tianqi, Gibbons, Phillip B., Mowry, Todd C.

There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution

Externí odkaz: http://arxiv.org/abs/2110.10221

Zobrazit plný text záznamu

Report

Cortex: A Compiler for Recursive Deep Learning Models

Autor: Fegade, Pratik, Chen, Tianqi, Gibbons, Phillip B., Mowry, Todd C.

Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves significant perf

Externí odkaz: http://arxiv.org/abs/2011.01383

Zobrazit plný text záznamu

Report

RowClone: Accelerating Data Movement and Initialization Using DRAM

Autor: Seshadri, Vivek, Kim, Yoongu, Fallin, Chris, Lee, Donghyuk, Ausavarungnirun, Rachata, Pekhimenko, Gennady, Luo, Yixin, Mutlu, Onur, Gibbons, Phillip B., Kozuch, Michael A., Mowry, Todd C.

In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. T

Externí odkaz: http://arxiv.org/abs/1805.03502

Zobrazit plný text záznamu

Report

Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM

Autor: Seshadri, Vivek, Lee, Donghyuk, Mullins, Thomas, Hassan, Hasan, Boroumand, Amirali, Kim, Jeremie, Kozuch, Michael A., Mutlu, Onur, Gibbons, Phillip B., Mowry, Todd C.

Bitwise operations are an important component of modern day programming. Many widely-used data structures (e.g., bitmap indices in databases) rely on fast bitwise operations on large bit vectors to achieve high performance. Unfortunately, in existing

Externí odkaz: http://arxiv.org/abs/1611.09988

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Report

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps

Autor: Vijaykumar, Nandita, Pekhimenko, Gennady, Jog, Adwait, Ghose, Saugata, Bhowmick, Abhishek, Ausavarangnirun, Rachata, Das, Chita, Kandemir, Mahmut, Mowry, Todd C., Mutlu, Onur

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilizatio

Externí odkaz: http://arxiv.org/abs/1602.01348

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání