Výsledky vyhledávání

Report

FastPersist: Accelerating Model Checkpointing in Deep Learning

Autor: Wang, Guanhua, Ruwase, Olatunji, Xie, Bing, He, Yuxiong

Model checkpoints are critical Deep Learning (DL) artifacts that enable fault tolerance for training and downstream applications, such as inference. However, writing checkpoints to persistent storage, and other I/O aspects of DL training, are mostly

Externí odkaz: http://arxiv.org/abs/2406.13768

Zobrazit plný text záznamu

Report

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Autor: Xia, Haojun, Zheng, Zhen, Wu, Xiaoxia, Chen, Shiyang, Yao, Zhewei, Youn, Stephen, Bakhtiari, Arash, Wyatt, Michael, Zhuang, Donglin, Zhou, Zhongzhu, Ruwase, Olatunji, He, Yuxiong, Song, Shuaiwen Leon

Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and s

Externí odkaz: http://arxiv.org/abs/2401.14112

Zobrazit plný text záznamu

Report

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Autor: Wu, Xiaoxia, Xia, Haojun, Youn, Stephen, Zheng, Zhen, Chen, Shiyang, Bakhtiari, Arash, Wyatt, Michael, Aminabadi, Reza Yazdani, He, Yuxiong, Ruwase, Olatunji, Song, Leon, Yao, Zhewei

This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to

Externí odkaz: http://arxiv.org/abs/2312.08583

Zobrazit plný text záznamu

Report

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

Autor: Yao, Zhewei, Aminabadi, Reza Yazdani, Youn, Stephen, Wu, Xiaoxia, Zheng, Elton, He, Yuxiong

Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference. Existing solutions, such as ZeroQuant, offer dynamic quantization for models like BERT and GPT but overlook crucial memory-bounded

Externí odkaz: http://arxiv.org/abs/2310.17723

Zobrazit plný text záznamu

Report

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Autor: Song, Shuaiwen Leon, Kruft, Bonnie, Zhang, Minjia, Li, Conglong, Chen, Shiyang, Zhang, Chengming, Tanaka, Masahiro, Wu, Xiaoxia, Rasley, Jeff, Awan, Ammar Ahmad, Holmes, Connor, Cai, Martin, Ghanem, Adam, Zhou, Zhongzhu, He, Yuxiong, Luferenko, Pete, Kumar, Divya, Weyn, Jonathan, Zhang, Ruixiong, Klocek, Sylwester, Vragov, Volodymyr, AlQuraishi, Mohammed, Ahdritz, Gustaf, Floristean, Christina, Negri, Cristina, Kotamarthi, Rao, Vishwanath, Venkatram, Ramanathan, Arvind, Foreman, Sam, Hippe, Kyle, Arcomano, Troy, Maulik, Romit, Zvyagin, Maxim, Brace, Alexander, Zhang, Bin, Bohorquez, Cindy Orozco, Clyde, Austin, Kale, Bharat, Perez-Rivera, Danilo, Ma, Heng, Mann, Carla M., Irvin, Michael, Pauloski, J. Gregory, Ward, Logan, Hayot, Valerie, Emani, Murali, Xie, Zhen, Lin, Diangen, Shukla, Maulik, Foster, Ian, Davis, James J., Papka, Michael E., Brettin, Thomas, Balaprakash, Prasanna, Tourassi, Gina, Gounley, John, Hanson, Heidi, Potok, Thomas E, Pasini, Massimiliano Lupo, Evans, Kate, Lu, Dan, Lunga, Dalton, Yin, Junqi, Dash, Sajal, Wang, Feiyi, Shankar, Mallikarjun, Lyngaas, Isaac, Wang, Xiao, Cong, Guojing, Zhang, Pei, Fan, Ming, Liu, Siyan, Hoisie, Adolfy, Yoo, Shinjae, Ren, Yihui, Tang, William, Felker, Kyle, Svyatkovskiy, Alexey, Liu, Hang, Aji, Ashwin, Dalton, Angela, Schulte, Michael, Schulz, Karl, Deng, Yuntian, Nie, Weili, Romero, Josh, Dallago, Christian, Vahdat, Arash, Xiao, Chaowei, Gibbs, Thomas, Anandkumar, Anima, Stevens, Rick

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors fro

Externí odkaz: http://arxiv.org/abs/2310.04610

Zobrazit plný text záznamu

Report

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Autor: Jacobs, Sam Ade, Tanaka, Masahiro, Zhang, Chengming, Zhang, Minjia, Song, Shuaiwen Leon, Rajbhandari, Samyam, He, Yuxiong

Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three d

Externí odkaz: http://arxiv.org/abs/2309.14509

Zobrazit plný text záznamu

Report

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

Autor: Yao, Zhewei, Wu, Xiaoxia, Li, Conglong, Zhang, Minjia, Qin, Heyang, Ruwase, Olatunji, Awan, Ammar Ahmad, Rajbhandari, Samyam, He, Yuxiong

Most of the existing multi-modal models, hindered by their incapacity to adeptly manage interleaved image-and-text inputs in multi-image, multi-round dialogues, face substantial constraints in resource allocation for training and data accessibility,

Externí odkaz: http://arxiv.org/abs/2309.14327

Zobrazit plný text záznamu

Report

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Autor: Bie, Fengxiang, Yang, Yibo, Zhou, Zhongzhu, Ghanem, Adam, Zhang, Minjia, Yao, Zhewei, Wu, Xiaoxia, Holmes, Connor, Golnari, Pareesa, Clifton, David A., He, Yuxiong, Tao, Dacheng, Song, Shuaiwen Leon

Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generativ

Externí odkaz: http://arxiv.org/abs/2309.00810

Zobrazit plný text záznamu

Report

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Autor: Yao, Zhewei, Aminabadi, Reza Yazdani, Ruwase, Olatunji, Rajbhandari, Samyam, Wu, Xiaoxia, Awan, Ammar Ahmad, Rasley, Jeff, Zhang, Minjia, Li, Conglong, Holmes, Connor, Zhou, Zhongzhu, Wyatt, Michael, Smith, Molly, Kurilenko, Lev, Qin, Heyang, Tanaka, Masahiro, Che, Shuai, Song, Shuaiwen Leon, He, Yuxiong

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and c

Externí odkaz: http://arxiv.org/abs/2308.01320

Zobrazit plný text záznamu

Report

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

Autor: Wu, Xiaoxia, Yao, Zhewei, He, Yuxiong

In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge. Navigating the inherent limitations of uniform quantization, particularly when dealing

Externí odkaz: http://arxiv.org/abs/2307.09782

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání