Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Colbert, Ian"'
Several recent studies have investigated low-precision accumulation, reporting improvements in throughput, power, and area across various platforms. However, the accompanying proposals have only considered the quantization-aware training (QAT) paradi
Externí odkaz:
http://arxiv.org/abs/2409.17092
Quantization techniques commonly reduce the inference costs of neural networks by restricting the precision of weights and activations. Recent studies show that also reducing the precision of the accumulator can further improve hardware efficiency at
Externí odkaz:
http://arxiv.org/abs/2401.10432
We present accumulator-aware quantization (A2Q), a novel weight quantization method designed to train quantized neural networks (QNNs) to avoid overflow when using low-precision accumulators during inference. A2Q introduces a unique formulation inspi
Externí odkaz:
http://arxiv.org/abs/2308.13504
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during training usin
Externí odkaz:
http://arxiv.org/abs/2301.13376
The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model
Externí odkaz:
http://arxiv.org/abs/2209.06931
Autor:
Colbert, Ian, Saeedi, Mehdi
Recent advancements in deep reinforcement learning have brought forth an impressive display of highly skilled artificial agents capable of complex intelligent behavior. In video games, these artificial agents are increasingly deployed as non-playable
Externí odkaz:
http://arxiv.org/abs/2203.05965
GPU compilers are complex software programs with many optimizations specific to target hardware. These optimizations are often controlled by heuristics hand-designed by compiler experts using time- and resource-intensive processes. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2111.12055
Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often applied to onl
Externí odkaz:
http://arxiv.org/abs/2110.08271
A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or sub-pixel convolu
Externí odkaz:
http://arxiv.org/abs/2107.07647
The use of Deep Learning hardware algorithms for embedded applications is characterized by challenges such as constraints on device power consumption, availability of labeled data, and limited internet bandwidth for frequent training on cloud servers
Externí odkaz:
http://arxiv.org/abs/2102.00534