Výsledky vyhledávání - "Aminabadi, Reza Yazdani"

Report

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Autor: Wu, Xiaoxia, Xia, Haojun, Youn, Stephen, Zheng, Zhen, Chen, Shiyang, Bakhtiari, Arash, Wyatt, Michael, Aminabadi, Reza Yazdani, He, Yuxiong, Ruwase, Olatunji, Song, Leon, Yao, Zhewei

This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to

Externí odkaz: http://arxiv.org/abs/2312.08583

Zobrazit plný text záznamu

Report

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

Autor: Yao, Zhewei, Aminabadi, Reza Yazdani, Youn, Stephen, Wu, Xiaoxia, Zheng, Elton, He, Yuxiong

Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference. Existing solutions, such as ZeroQuant, offer dynamic quantization for models like BERT and GPT but overlook crucial memory-bounded

Externí odkaz: http://arxiv.org/abs/2310.17723

Zobrazit plný text záznamu

Report

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Autor: Yao, Zhewei, Aminabadi, Reza Yazdani, Ruwase, Olatunji, Rajbhandari, Samyam, Wu, Xiaoxia, Awan, Ammar Ahmad, Rasley, Jeff, Zhang, Minjia, Li, Conglong, Holmes, Connor, Zhou, Zhongzhu, Wyatt, Michael, Smith, Molly, Kurilenko, Lev, Qin, Heyang, Tanaka, Masahiro, Che, Shuai, Song, Shuaiwen Leon, He, Yuxiong

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and c

Externí odkaz: http://arxiv.org/abs/2308.01320

Zobrazit plný text záznamu

Report

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Autor: Wu, Xiaoxia, Li, Cheng, Aminabadi, Reza Yazdani, Yao, Zhewei, He, Yuxiong

Publikováno v: Fortieth International Conference on Machine Learning 2023

Improving the deployment efficiency of transformer-based language models has been challenging given their high computation and memory cost. While INT8 quantization has recently been shown to be effective in reducing both the memory cost and latency w

Externí odkaz: http://arxiv.org/abs/2301.12017

Zobrazit plný text záznamu

Report

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Autor: Aminabadi, Reza Yazdani, Rajbhandari, Samyam, Zhang, Minjia, Awan, Ammar Ahmad, Li, Cheng, Li, Du, Zheng, Elton, Rasley, Jeff, Smith, Shaden, Ruwase, Olatunji, He, Yuxiong

The past several years have witnessed the success of transformer-based models, and their scale and application scenarios continue to grow aggressively. The current landscape of transformer models is increasingly diverse: the model size varies drastic

Externí odkaz: http://arxiv.org/abs/2207.00032

Zobrazit plný text záznamu

Report

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Autor: Yao, Zhewei, Aminabadi, Reza Yazdani, Zhang, Minjia, Wu, Xiaoxia, Li, Conglong, He, Yuxiong

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements. In this work, we present an efficient and

Externí odkaz: http://arxiv.org/abs/2206.01861

Zobrazit plný text záznamu

Report

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success, the size o

Externí odkaz: http://arxiv.org/abs/2201.11990

Zobrazit plný text záznamu

Report

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Autor: Rajbhandari, Samyam, Li, Conglong, Yao, Zhewei, Zhang, Minjia, Aminabadi, Reza Yazdani, Awan, Ammar Ahmad, Rasley, Jeff, He, Yuxiong

As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models become one of the most promising model architectures due to their significant training cost re

Externí odkaz: http://arxiv.org/abs/2201.05596

Zobrazit plný text záznamu

Report

ZeRO-Offload: Democratizing Billion-Scale Model Training

Autor: Ren, Jie, Rajbhandari, Samyam, Aminabadi, Reza Yazdani, Ruwase, Olatunji, Yang, Shuangyan, Zhang, Minjia, Li, Dong, He, Yuxiong

Large-scale model training has been a playing ground for a limited few requiring complex model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large model training landscape by making large model training acce

Externí odkaz: http://arxiv.org/abs/2101.06840

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání