Zobrazeno 1 - 10
of 19
pro vyhledávání: '"Sah, Sudhakar"'
High runtime memory and high latency puts significant constraint on Vision Transformer training and inference, especially on edge devices. Token pruning reduces the number of input tokens to the ViT based on importance criteria of each token. We pres
Externí odkaz:
http://arxiv.org/abs/2410.09324
Activation functions introduce non-linearity into Neural Networks, enabling them to learn complex patterns. Different activation functions vary in speed and accuracy, ranging from faster but less accurate options like ReLU to slower but more accurate
Externí odkaz:
http://arxiv.org/abs/2410.10887
Autor:
Sah, Sudhakar, Ganji, Darshan C., Grimaldi, Matteo, Kumar, Ravish, Hoffman, Alexander, Rohmetra, Honnesh, Saboori, Ehsan
We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various i
Externí odkaz:
http://arxiv.org/abs/2409.18866
Autor:
AskariHemmat, MohammadHossein, Jeddi, Ahmadreza, Hemmat, Reyhane Askari, Lazarevich, Ivan, Hoffman, Alexander, Sah, Sudhakar, Saboori, Ehsan, Savaria, Yvon, David, Jean-Pierre
Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a characteristic th
Externí odkaz:
http://arxiv.org/abs/2404.11769
Autor:
Ashfaq, Saad, Hoffman, Alexander, Mitra, Saptarshi, Sah, Sudhakar, AskariHemmat, MohammadHossein, Saboori, Ehsan
The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically n
Externí odkaz:
http://arxiv.org/abs/2309.10878
The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is k
Externí odkaz:
http://arxiv.org/abs/2309.06626
Autor:
Lazarevich, Ivan, Grimaldi, Matteo, Kumar, Ravish, Mitra, Saptarshi, Khan, Shahrukh, Sah, Sudhakar
We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, NPU). We collect accuracy and latency numbers for a variety of Y
Externí odkaz:
http://arxiv.org/abs/2307.13901
Autor:
Ganji, Darshan C., Ashfaq, Saad, Saboori, Ehsan, Sah, Sudhakar, Mitra, Saptarshi, AskariHemmat, MohammadHossein, Hoffman, Alexander, Hassanien, Ahmed, Léonardon, Mathieu
A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve mode
Externí odkaz:
http://arxiv.org/abs/2304.09049
Autor:
Ashfaq, Saad, AskariHemmat, MohammadHossein, Sah, Sudhakar, Saboori, Ehsan, Mastropietro, Olivier, Hoffman, Alexander
Deep Learning has been one of the most disruptive technological advancements in recent times. The high performance of deep learning models comes at the expense of high computational, storage and power requirements. Sensing the immediate need for acce
Externí odkaz:
http://arxiv.org/abs/2207.08820
Autor:
AskariHemmat, MohammadHossein, Hemmat, Reyhane Askari, Hoffman, Alex, Lazarevich, Ivan, Saboori, Ehsan, Mastropietro, Olivier, Sah, Sudhakar, Savaria, Yvon, David, Jean-Pierre
In this paper we study the effects of quantization in DNN training. We hypothesize that weight quantization is a form of regularization and the amount of regularization is correlated with the quantization level (precision). We confirm our hypothesis
Externí odkaz:
http://arxiv.org/abs/2206.12372