Zobrazeno 1 - 10
of 172
pro vyhledávání: '"Mowry, Todd C."'
Autor:
Lai, Ruihang, Shao, Junru, Feng, Siyuan, Lyubomirsky, Steven S., Hou, Bohan, Lin, Wuwei, Ye, Zihao, Jin, Hongyi, Jin, Yuchen, Liu, Jiawei, Jin, Lesheng, Cai, Yaxing, Jiang, Ziheng, Wu, Yong, Park, Sunghyun, Srivastava, Prakalp, Roesch, Jared G., Mowry, Todd C., Chen, Tianqi
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this pap
Externí odkaz:
http://arxiv.org/abs/2311.02103
Dynamic control flow is an important technique often used to design expressive and efficient deep learning computations for applications such as text parsing, machine translation, exiting early out of deep models and so on. The control flow divergenc
Externí odkaz:
http://arxiv.org/abs/2305.10611
Batching has a fundamental influence on the efficiency of deep neural network (DNN) execution. However, for dynamic DNNs, efficient batching is particularly challenging as the dataflow graph varies per input instance. As a result, state-of-the-art fr
Externí odkaz:
http://arxiv.org/abs/2302.03851
There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution
Externí odkaz:
http://arxiv.org/abs/2110.10221
Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves significant perf
Externí odkaz:
http://arxiv.org/abs/2011.01383
Autor:
Seshadri, Vivek, Kim, Yoongu, Fallin, Chris, Lee, Donghyuk, Ausavarungnirun, Rachata, Pekhimenko, Gennady, Luo, Yixin, Mutlu, Onur, Gibbons, Phillip B., Kozuch, Michael A., Mowry, Todd C.
In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. T
Externí odkaz:
http://arxiv.org/abs/1805.03502
Autor:
Seshadri, Vivek, Lee, Donghyuk, Mullins, Thomas, Hassan, Hasan, Boroumand, Amirali, Kim, Jeremie, Kozuch, Michael A., Mutlu, Onur, Gibbons, Phillip B., Mowry, Todd C.
Bitwise operations are an important component of modern day programming. Many widely-used data structures (e.g., bitmap indices in databases) rely on fast bitwise operations on large bit vectors to achieve high performance. Unfortunately, in existing
Externí odkaz:
http://arxiv.org/abs/1611.09988
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Vijaykumar, Nandita, Pekhimenko, Gennady, Jog, Adwait, Ghose, Saugata, Bhowmick, Abhishek, Ausavarangnirun, Rachata, Das, Chita, Kandemir, Mahmut, Mowry, Todd C., Mutlu, Onur
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilizatio
Externí odkaz:
http://arxiv.org/abs/1602.01348
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.