Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Gopalakrishnan, Kailash"'
Autor:
Fasoli, Andrea, Chen, Chia-Yu, Serrano, Mauricio, Venkataramani, Swagath, Saon, George, Cui, Xiaodong, Kingsbury, Brian, Gopalakrishnan, Kailash
We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T). We use a 4 bit integer representation for both weights and activations and apply Quantization Aware Training (QAT) to r
Externí odkaz:
http://arxiv.org/abs/2206.07882
Autor:
Fasoli, Andrea, Chen, Chia-Yu, Serrano, Mauricio, Sun, Xiao, Wang, Naigang, Venkataramani, Swagath, Saon, George, Cui, Xiaodong, Kingsbury, Brian, Zhang, Wei, Tüske, Zoltán, Gopalakrishnan, Kailash
We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-H
Externí odkaz:
http://arxiv.org/abs/2108.12074
Autor:
Chen, Chia-Yu, Ni, Jiamin, Lu, Songtao, Cui, Xiaodong, Chen, Pin-Yu, Sun, Xiao, Wang, Naigang, Venkataramani, Swagath, Srinivasan, Vijayalakshmi, Zhang, Wei, Gopalakrishnan, Kailash
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demo
Externí odkaz:
http://arxiv.org/abs/2104.11125
Autor:
Sun, Ximeng, Panda, Rameswar, Chen, Chun-Fu, Wang, Naigang, Pan, Bowen, Gopalakrishnan, Kailash, Oliva, Aude, Feris, Rogerio, Saenko, Kate
Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models for differe
Externí odkaz:
http://arxiv.org/abs/2103.01435
Autor:
Fu, Yonggan, You, Haoran, Zhao, Yang, Wang, Yue, Li, Chaojian, Gopalakrishnan, Kailash, Wang, Zhangyang, Lin, Yingyan
Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at
Externí odkaz:
http://arxiv.org/abs/2012.13113
Autor:
Sakr, Charbel, Wang, Naigang, Chen, Chia-Yu, Choi, Jungwook, Agrawal, Ankur, Shanbhag, Naresh, Gopalakrishnan, Kailash
Efforts to reduce the numerical precision of computations in deep learning training have yielded systems that aggressively quantize weights and activations, yet employ wide high-precision accumulators for partial sums in inner-product operations to p
Externí odkaz:
http://arxiv.org/abs/1901.06588
The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storag
Externí odkaz:
http://arxiv.org/abs/1812.08011
Autor:
Choi, Jungwook, Chuang, Pierce I-Jen, Wang, Zhuo, Venkataramani, Swagath, Srinivasan, Vijayalakshmi, Gopalakrishnan, Kailash
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight quantization, and oth
Externí odkaz:
http://arxiv.org/abs/1807.06964
Autor:
Choi, Jungwook, Wang, Zhuo, Venkataramani, Swagath, Chuang, Pierce I-Jen, Srinivasan, Vijayalakshmi, Gopalakrishnan, Kailash
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, whic
Externí odkaz:
http://arxiv.org/abs/1805.06085
Autor:
Chen, Chia-Yu, Choi, Jungwook, Brand, Daniel, Agrawal, Ankur, Zhang, Wei, Gopalakrishnan, Kailash
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression t
Externí odkaz:
http://arxiv.org/abs/1712.02679