Zobrazeno 1 - 10
of 199
pro vyhledávání: '"Fedorov, Igor"'
Autor:
Annavajjala, Aditya, Khare, Alind, Agrawal, Animesh, Fedorov, Igor, Latapie, Hugo, Lee, Myungjin, Tumanov, Alexey
CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment cons
Externí odkaz:
http://arxiv.org/abs/2407.06167
Autor:
Liu, Zechun, Zhao, Changsheng, Fedorov, Igor, Soran, Bilge, Choudhary, Dhruv, Krishnamoorthi, Raghuraman, Chandra, Vikas, Tian, Yuandong, Blankevoort, Tijmen
Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are pre
Externí odkaz:
http://arxiv.org/abs/2405.16406
Autor:
Liu, Zechun, Zhao, Changsheng, Iandola, Forrest, Lai, Chen, Tian, Yuandong, Fedorov, Igor, Xiong, Yunyang, Chang, Ernie, Shi, Yangyang, Krishnamoorthi, Raghuraman, Lai, Liangzhen, Chandra, Vikas
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice
Externí odkaz:
http://arxiv.org/abs/2402.14905
Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot
Externí odkaz:
http://arxiv.org/abs/2311.13169
Autor:
Wen, Wei, Liu, Kuang-Hung, Fedorov, Igor, Zhang, Xin, Yin, Hang, Chu, Weiwei, Hassani, Kaveh, Sun, Mengying, Liu, Jiang, Wang, Xu, Jiang, Lin, Chen, Yuxin, Zhang, Buyun, Liu, Xi, Cheng, Dehua, Chen, Zhengxing, Zhao, Guang, Han, Fangqiu, Yang, Jiyan, Hao, Yuchen, Xiong, Liang, Chen, Wen-Yen
Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and potential for ranking systems. However, prior work focused on academic problems, which are evaluated at small scale under well-controlled fixed baselines. In indust
Externí odkaz:
http://arxiv.org/abs/2311.08430
Autor:
Zhang, Tunhou, Wen, Wei, Fedorov, Igor, Liu, Xi, Zhang, Buyun, Han, Fangqiu, Chen, Wen-Yen, Han, Yiping, Yan, Feng, Li, Hai, Chen, Yiran
Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires
Externí odkaz:
http://arxiv.org/abs/2311.00231
Autor:
Chai, Yuji, Tripathy, Devashree, Zhou, Chuteng, Gope, Dibakar, Fedorov, Igor, Matas, Ramon, Brooks, David, Wei, Gu-Yeon, Whatmough, Paul
The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability i
Externí odkaz:
http://arxiv.org/abs/2301.10999
Autor:
Bhardwaj, Kartikeya, Ward, James, Tung, Caleb, Gope, Dibakar, Meng, Lingchuan, Fedorov, Igor, Chalfin, Alex, Whatmough, Paul, Loh, Danny
Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount o
Externí odkaz:
http://arxiv.org/abs/2208.08562
Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignorin
Externí odkaz:
http://arxiv.org/abs/2202.13826
Autor:
Fedorov, Igor, Matas, Ramon, Tann, Hokchhay, Zhou, Chuteng, Mattina, Matthew, Whatmough, Paul
Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and sparsity to
Externí odkaz:
http://arxiv.org/abs/2201.05842