Výsledky vyhledávání - "RASHIDI, SAEED"

Report

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Autor: Rashidi, Saeed, Won, William, Srinivasan, Sudarshan, Gupta, Puneet, Krishna, Tushar

Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects

Externí odkaz: http://arxiv.org/abs/2406.19580

Zobrazit plný text záznamu

Report

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Autor: Sridharan, Srinivas, Heo, Taekyung, Feng, Louis, Wang, Zhaodong, Bergeron, Matt, Fu, Wenyin, Zheng, Shengbao, Coutinho, Brian, Rashidi, Saeed, Man, Changhai, Krishna, Tushar

Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different

Externí odkaz: http://arxiv.org/abs/2305.14516

Zobrazit plný text záznamu

Report

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Autor: Won, William, Heo, Taekyung, Rashidi, Saeed, Sridharan, Srinivas, Srinivasan, Sudarshan, Krishna, Tushar

As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-

Externí odkaz: http://arxiv.org/abs/2303.14006

Zobrazit plný text záznamu

Report

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

Autor: Kadiyala, Divya Kiran, Rashidi, Saeed, Heo, Taekyung, Bambhaniya, Abhimanyu Rajeshkumar, Krishna, Tushar, Daglis, Alexandros

Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train. Designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging task requiri

Externí odkaz: http://arxiv.org/abs/2211.16648

Zobrazit plný text záznamu

Report

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Autor: Khan, Tarannum, Rashidi, Saeed, Sridharan, Srinivas, Shurpali, Pallavi, Akella, Aditya, Krishna, Tushar

RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing th

Externí odkaz: http://arxiv.org/abs/2207.10898

Zobrazit plný text záznamu

Report

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Autor: Rashidi, Saeed, Won, William, Srinivasan, Sudarshan, Sridharan, Srinivas, Krishna, Tushar

Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activat

Externí odkaz: http://arxiv.org/abs/2110.04478

Zobrazit plný text záznamu

Report

LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models

Autor: Won, William, Rashidi, Saeed, Srinivasan, Sudarshan, Krishna, Tushar

Publikováno v: Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '24)

As model sizes in machine learning continue to scale, distributed training is necessary to accommodate model weights within each device and to reduce training time. However, this comes with the expense of increased communication overhead due to the e

Externí odkaz: http://arxiv.org/abs/2109.11762

Zobrazit plný text záznamu

Report

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Autor: Abdi, Afshin, Rashidi, Saeed, Fekri, Faramarz, Krishna, Tushar

Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the paralle

Externí odkaz: http://arxiv.org/abs/2008.08289

Zobrazit plný text záznamu

Report

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

Autor: Rashidi, Saeed, Denton, Matthew, Sridharan, Srinivas, Srinivasan, Sudarshan, Suresh, Amoghavarsha, Ni, Jade, Krishna, Tushar

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this bandwidth is

Externí odkaz: http://arxiv.org/abs/2007.00156

Zobrazit plný text záznamu

Akademický článek

A Survey on PCM Lifetime Enhancement Schemes.

Autor: RASHIDI, SAEED¹ saeed.rashidi@gatech.edu, JALILI, MAJID¹ majid@utexas.edu, SARBAZI-AZAD, HAMID² azad@ipm.ir

Publikováno v: ACM Computing Surveys. Jul2020, Vol. 52 Issue 4, p1-38. 38p.

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání