Výsledky vyhledávání - "distributed training"

Akademický článek

It Takes a Village: A Distributed Training Model for AI-Based Chatbots.

Autor: Twomey, Beth¹ btwomey@udel.edu, Johnson, Annie² akjohnso@udel.edu, Estes, Colleen³ cestes@udel.edu

Publikováno v: Information Technology & Libraries. Sep2024, Vol. 43 Issue 3, p1-8. 8p.

Zobrazit plný text záznamu

Report

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

Autor: Cheng, Jialiang, Gao, Ning, Yue, Yun, Ye, Zhiling, Jiang, Jiadi, Sha, Jian

Distributed training methods are crucial for large language models (LLMs). However, existing distributed training methods often suffer from communication bottlenecks, stragglers, and limited elasticity. Local SGD methods have been proposed to address

Externí odkaz: http://arxiv.org/abs/2412.07210

Zobrazit plný text záznamu

Report

Echo: Simulating Distributed Training At Scale

Autor: Feng, Yicheng, Chen, Yuetao, Chen, Kaiwen, Li, Jingzong, Wu, Tianyuan, Cheng, Peng, Wu, Chuan, Wang, Wei, Ho, Tsung-Yi, Xu, Hong

Simulation offers unique values for both enumeration and extrapolation purposes, and is becoming increasingly important for managing the massive machine learning (ML) clusters and large-scale distributed training jobs. In this paper, we build Echo to

Externí odkaz: http://arxiv.org/abs/2412.12487

Zobrazit plný text záznamu

Report

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training

Autor: Fernandez, Jared, Wehrstedt, Luca, Shamis, Leonid, Elhoushi, Mostafa, Saladi, Kalyan, Bisk, Yonatan, Strubell, Emma, Kahn, Jacob

Dramatic increases in the capabilities of neural network models in recent years are driven by scaling model size, training data, and corresponding computational resources. To develop the exceedingly large networks required in modern applications, suc

Externí odkaz: http://arxiv.org/abs/2411.13055

Zobrazit plný text záznamu

Report

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks

Autor: Shi, Ziji, Li, Jialin, You, Yang

Recent advances in Generative Artificial Intelligence have fueled numerous applications, particularly those involving Generative Adversarial Networks (GANs), which are essential for synthesizing realistic photos and videos. However, efficiently train

Externí odkaz: http://arxiv.org/abs/2411.03999

Zobrazit plný text záznamu

Report

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Autor: Choukroun, Yoni, Azoulay, Shlomi, Kisilev, Pavel

Distributed machine learning has recently become a critical paradigm for training large models on vast datasets. We examine the stochastic optimization problem for deep learning within synchronous parallel computing environments under communication c

Externí odkaz: http://arxiv.org/abs/2411.03742

Zobrazit plný text záznamu

Report

On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers

Autor: Lu, Zhengxian, Wang, Fangyu, Xu, Zhiwei, Yang, Fei, Li, Tao

Transformer models have emerged as potent solutions to a wide array of multidisciplinary challenges. The deployment of Transformer architectures is significantly hindered by their extensive computational and memory requirements, necessitating the rel

Externí odkaz: http://arxiv.org/abs/2407.02081

Zobrazit plný text záznamu

Report

Leiden-Fusion Partitioning Method for Effective Distributed Training of Graph Embeddings

Autor: Bai, Yuhe, Constantin, Camelia, Naacke, Hubert

In the area of large-scale training of graph embeddings, effective training frameworks and partitioning methods are critical for handling large networks. However, they face two major challenges: 1) existing synchronized distributed frameworks require

Externí odkaz: http://arxiv.org/abs/2409.09887

Zobrazit plný text záznamu

Report

Heta: Distributed Training of Heterogeneous Graph Neural Networks

Autor: Zhong, Yuchen, Su, Junwei, Wu, Chuan, Wang, Minjie

Heterogeneous Graph Neural Networks (HGNNs) leverage diverse semantic relationships in Heterogeneous Graphs (HetGs) and have demonstrated remarkable learning performance in various applications. However, current distributed GNN training systems often

Externí odkaz: http://arxiv.org/abs/2408.09697

Zobrazit plný text záznamu

Report

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Autor: Zhao, Juntao, Wan, Borui, Peng, Yanghua, Lin, Haibin, Zhu, Yibo, Wu, Chuan

A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling. Conducting DNN training with a combination of heterogeneous training and infer

Externí odkaz: http://arxiv.org/abs/2407.02327

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání