Výsledky vyhledávání - "Pati, Suchita"

Report

Global Optimizations & Lightweight Dynamic Logic for Concurrency

Autor: Pati, Suchita, Aga, Shaizeen, Jayasena, Nuwan, Sinclair, Matthew D.

Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device's compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMM

Externí odkaz: http://arxiv.org/abs/2409.02227

Zobrazit plný text záznamu

Report

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

Autor: Pati, Suchita, Aga, Shaizeen, Islam, Mahzabeen, Jayasena, Nuwan, Sinclair, Matthew D.

Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed t

Externí odkaz: http://arxiv.org/abs/2401.16677

Zobrazit plný text záznamu

Report

Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

Autor: Ibrahim, Mohamed Assem, Aga, Shaizeen, Li, Ada, Pati, Suchita, Islam, Mahzabeen

Data format innovations have been critical for machine learning (ML) scaling, which in turn fuels ground-breaking ML capabilities. However, even in the presence of low-precision formats, model weights are often stored in both high-precision and low-p

Externí odkaz: http://arxiv.org/abs/2311.05034

Zobrazit plný text záznamu

Report

Computation vs. Communication Scaling for Future Transformers on Future Hardware

Autor: Pati, Suchita, Aga, Shaizeen, Islam, Mahzabeen, Jayasena, Nuwan, Sinclair, Matthew D.

Scaling neural network models has delivered dramatic quality gains across ML problems. However, this scaling has increased the reliance on efficient distributed training techniques. Accordingly, as with other distributed computing scenarios, it is im

Externí odkaz: http://arxiv.org/abs/2302.02825

Zobrazit plný text záznamu

Report

Demystifying BERT: Implications for Accelerator Design

Autor: Pati, Suchita, Aga, Shaizeen, Jayasena, Nuwan, Sinclair, Matthew D.

Transfer learning in natural language processing (NLP), as realized using models like BERT (Bi-directional Encoder Representation from Transformer), has significantly improved language representation with models that can tackle challenging language p

Externí odkaz: http://arxiv.org/abs/2104.08335

Zobrazit plný text záznamu

Report

SeqPoint: Identifying Representative Iterations of Sequence-based Neural Networks

Autor: Pati, Suchita, Aga, Shaizeen, Sinclair, Matthew D., Jayasena, Nuwan

The ubiquity of deep neural networks (DNNs) continues to rise, making them a crucial application class for hardware optimizations. However, detailed profiling and characterization of DNN training remains difficult as these applications often run for

Externí odkaz: http://arxiv.org/abs/2007.10459

Zobrazit plný text záznamu

Report

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

Autor: Lew, Jonathan, Shah, Deval, Pati, Suchita, Cattell, Shaylin, Zhang, Mengchi, Sandhupatla, Amruth, Ng, Christopher, Goli, Negar, Sinclair, Matthew D., Rogers, Timothy G., Aamodt, Tor

Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in N

Externí odkaz: http://arxiv.org/abs/1811.08933

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání