Výsledky vyhledávání - "Diao, Lansong"

Report

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Autor: Zhang, Shiwei, Diao, Lansong, Wu, Chuan, Cao, Zongyan, Wang, Siyu, Lin, Wei

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large model learning

Externí odkaz: http://arxiv.org/abs/2401.05965

Zobrazit plný text záznamu

Report

Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches

Autor: Wang, Siyu, Cao, Zongyan, Si, Chang, Diao, Lansong, Wang, Jiamang, Lin, Wei

Pipeline parallelism has been demonstrated to be a remarkable approach to improve throughput for training deep neural networks with billions of parameters over heterogeneous clusters. The 1F1B scheduling plan is a widely adopted strategy for memory a

Externí odkaz: http://arxiv.org/abs/2303.01675

Zobrazit plný text záznamu

Report

Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform

Autor: Zhang, Shiwei, Diao, Lansong, Wang, Siyu, Cao, Zongyan, Gu, Yiliang, Si, Chang, Shi, Ziji, Zheng, Zhen, Wu, Chuan, Lin, Wei

We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment. It transforms a tensor program written for a single device into an equivalent distributed program that is capab

Externí odkaz: http://arxiv.org/abs/2302.08141

Zobrazit plný text záznamu

Report

Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

Autor: Zhang, Shiwei, Yi, Xiaodong, Diao, Lansong, Wu, Chuan, Wang, Siyu, Lin, Wei

This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device- and topology- heterogeneous ML clusters. We novelly combine both the DNN computation gr

Externí odkaz: http://arxiv.org/abs/2302.06126

Zobrazit plný text záznamu

Report

Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion

Autor: Yi, Xiaodong, Zhang, Shiwei, Diao, Lansong, Wu, Chuan, Zheng, Zhen, Fan, Shiqing, Wang, Siyu, Yang, Jun, Lin, Wei

Publikováno v: IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4694-4706, 1 Dec. 2022

This paper proposes DisCo, an automatic deep learning compilation module for data-parallel distributed training. Unlike most deep learning compilers that focus on training or inference on a single device, DisCo optimizes a DNN model for distributed t

Externí odkaz: http://arxiv.org/abs/2209.12769

Zobrazit plný text záznamu

Report

DISC: A Dynamic Shape Compiler for Machine Learning Workloads

Autor: Zhu, Kai, Zhao, Wenyi, Zheng, Zhen, Guo, Tianyou, Zhao, Pengzhan, Zhu, Feiwen, Bai, Junjie, Yang, Jun, Liu, Xiaoyong, Diao, Lansong, Lin, Wei

Many recent machine learning models show dynamic shape characteristics. However, existing AI compiler optimization systems suffer a lot from problems brought by dynamic shape models, including compilation overhead, memory usage, optimization pipeline

Externí odkaz: http://arxiv.org/abs/2103.05288

Zobrazit plný text záznamu

Report

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads

Autor: Zheng, Zhen, Zhao, Pengzhan, Long, Guoping, Zhu, Feiwen, Zhu, Kai, Zhao, Wenyi, Diao, Lansong, Yang, Jun, Lin, Wei

We show in this work that memory intensive computations can result in severe performance problems due to off-chip memory access and CPU-GPU context switch overheads in a wide range of deep learning models. For this problem, current just-in-time (JIT)

Externí odkaz: http://arxiv.org/abs/2009.10924

Zobrazit plný text záznamu

Report

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

Autor: Wang, Siyu, Rong, Yi, Fan, Shiqing, Zheng, Zhen, Diao, LanSong, Long, Guoping, Yang, Jun, Liu, Xiaoyong, Lin, Wei

The last decade has witnessed growth in the computational requirements for training deep neural networks. Current approaches (e.g., data/model parallelism, pipeline parallelism) parallelize training tasks onto multiple devices. However, these approac

Externí odkaz: http://arxiv.org/abs/2007.04069

Zobrazit plný text záznamu

Report

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

Autor: Fan, Shiqing, Rong, Yi, Meng, Chen, Cao, Zongyan, Wang, Siyu, Zheng, Zhen, Wu, Chuan, Long, Guoping, Yang, Jun, Xia, Lixue, Diao, Lansong, Liu, Xiaoyong, Lin, Wei

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are

Externí odkaz: http://arxiv.org/abs/2007.01045

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání