Výsledky vyhledávání - "Fedorov, Igor"

Report

D{\epsilon}pS: Delayed {\epsilon}-Shrinking for Faster Once-For-All Training

Autor: Annavajjala, Aditya, Khare, Alind, Agrawal, Animesh, Fedorov, Igor, Latapie, Hugo, Lee, Myungjin, Tumanov, Alexey

CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment cons

Externí odkaz: http://arxiv.org/abs/2407.06167

Zobrazit plný text záznamu

Report

SpinQuant: LLM quantization with learned rotations

Autor: Liu, Zechun, Zhao, Changsheng, Fedorov, Igor, Soran, Bilge, Choudhary, Dhruv, Krishnamoorthi, Raghuraman, Chandra, Vikas, Tian, Yuandong, Blankevoort, Tijmen

Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are pre

Externí odkaz: http://arxiv.org/abs/2405.16406

Zobrazit plný text záznamu

Report

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Autor: Liu, Zechun, Zhao, Changsheng, Iandola, Forrest, Lai, Chen, Tian, Yuandong, Fedorov, Igor, Xiong, Yunyang, Chang, Ernie, Shi, Yangyang, Krishnamoorthi, Raghuraman, Lai, Liangzhen, Chandra, Vikas

This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice

Externí odkaz: http://arxiv.org/abs/2402.14905

Zobrazit plný text záznamu

Report

SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

Autor: Zheng, Hua, Liu, Kuang-Hung, Fedorov, Igor, Zhang, Xin, Chen, Wen-Yen, Wen, Wei

Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot

Externí odkaz: http://arxiv.org/abs/2311.13169

Zobrazit plný text záznamu

Report

Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and potential for ranking systems. However, prior work focused on academic problems, which are evaluated at small scale under well-controlled fixed baselines. In indust

Externí odkaz: http://arxiv.org/abs/2311.08430

Zobrazit plný text záznamu

Report

DistDNAS: Search Efficient Feature Interactions within 2 Hours

Autor: Zhang, Tunhou, Wen, Wei, Fedorov, Igor, Liu, Xi, Zhang, Buyun, Han, Fangqiu, Chen, Wen-Yen, Han, Yiping, Yan, Feng, Li, Hai, Chen, Yiran

Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires

Externí odkaz: http://arxiv.org/abs/2311.00231

Zobrazit plný text záznamu

Report

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

Autor: Chai, Yuji, Tripathy, Devashree, Zhou, Chuteng, Gope, Dibakar, Fedorov, Igor, Matas, Ramon, Brooks, David, Wei, Gu-Yeon, Whatmough, Paul

The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability i

Externí odkaz: http://arxiv.org/abs/2301.10999

Zobrazit plný text záznamu

Report

Restructurable Activation Networks

Autor: Bhardwaj, Kartikeya, Ward, James, Tung, Caleb, Gope, Dibakar, Meng, Lingchuan, Fedorov, Igor, Chalfin, Alex, Whatmough, Paul, Loh, Danny

Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount o

Externí odkaz: http://arxiv.org/abs/2208.08562

Zobrazit plný text záznamu

Report

Magnitude-aware Probabilistic Speaker Embeddings

Autor: Kuzmin, Nikita, Fedorov, Igor, Sholokhov, Alexey

Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignorin

Externí odkaz: http://arxiv.org/abs/2202.13826

Zobrazit plný text záznamu

Report

UDC: Unified DNAS for Compressible TinyML Models

Autor: Fedorov, Igor, Matas, Ramon, Tann, Hokchhay, Zhou, Chuteng, Mattina, Matthew, Whatmough, Paul

Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and sparsity to

Externí odkaz: http://arxiv.org/abs/2201.05842

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání