Výsledky vyhledávání

Report

HybridFlow: A Flexible and Efficient RLHF Framework

Autor: Sheng, Guangming, Zhang, Chi, Ye, Zilingfeng, Wu, Xibin, Zhang, Wang, Zhang, Ru, Peng, Yanghua, Lin, Haibin, Wu, Chuan

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependen

Externí odkaz: http://arxiv.org/abs/2409.19256

Zobrazit plný text záznamu

Report

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Autor: Feng, Weiqi, Chen, Yangrui, Wang, Shaoyu, Peng, Yanghua, Lin, Haibin, Yu, Minlan

Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual q

Externí odkaz: http://arxiv.org/abs/2408.03505

Zobrazit plný text záznamu

Report

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

Autor: Wan, Borui, Han, Mingji, Sheng, Yiyao, Peng, Yanghua, Lin, Haibin, Zhang, Mofan, Lai, Zhichao, Yu, Menghan, Zhang, Junda, Song, Zuquan, Liu, Xin, Wu, Chuan

Checkpointing to preserve training states is crucial during the development of Large Foundation Models (LFMs), for training resumption upon various failures or changes in GPU resources and parallelism configurations. In addition, saved checkpoints ar

Externí odkaz: http://arxiv.org/abs/2407.20143

Zobrazit plný text záznamu

Report

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Autor: Zhao, Juntao, Wan, Borui, Peng, Yanghua, Lin, Haibin, Zhu, Yibo, Wu, Chuan

A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling. Conducting DNN training with a combination of heterogeneous training and infer

Externí odkaz: http://arxiv.org/abs/2407.02327

Zobrazit plný text záznamu

Report

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Autor: Chang, Li-Wen, Bao, Wenlei, Hou, Qi, Jiang, Chengquan, Zheng, Ningxin, Zhong, Yinmin, Zhang, Xuanrun, Song, Zuquan, Jiang, Ziheng, Lin, Haibin, Jin, Xin, Liu, Xin

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning

Externí odkaz: http://arxiv.org/abs/2406.06858

Zobrazit plný text záznamu

Report

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Autor: Zhao, Juntao, Wan, Borui, Peng, Yanghua, Lin, Haibin, Wu, Chuan

Recent breakthroughs in Large-scale language models (LLMs) have demonstrated impressive performance on various tasks. The immense sizes of LLMs have led to very high resource demand and cost for running the models. Though the models are largely serve

Externí odkaz: http://arxiv.org/abs/2403.01136

Zobrazit plný text záznamu

Report

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedente

Externí odkaz: http://arxiv.org/abs/2402.15627

Zobrazit plný text záznamu

Report

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Autor: Hu, Hanpeng, Su, Junwei, Zhao, Juntao, Peng, Yanghua, Zhu, Yibo, Lin, Haibin, Wu, Chuan

Publikováno v: EuroSys 2024

Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications. Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks, such as DNN graph- or tensor-

Externí odkaz: http://arxiv.org/abs/2311.09690

Zobrazit plný text záznamu

Report

LEMON: Lossless model expansion

Autor: Wang, Yite, Su, Jiahao, Lu, Hanlin, Xie, Cong, Liu, Tianyi, Yuan, Jianbo, Lin, Haibin, Sun, Ruoyu, Yang, Hongxia

Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large model

Externí odkaz: http://arxiv.org/abs/2310.07999

Zobrazit plný text záznamu

Report

ByteComp: Revisiting Gradient Compression in Distributed Training

Autor: Wang, Zhuang, Lin, Haibin, Zhu, Yibo, Ng, T. S. Eugene

Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate inter

Externí odkaz: http://arxiv.org/abs/2205.14465

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání