Zobrazeno 1 - 10
of 97
pro vyhledávání: '"Mahajan, Divya"'
Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization of determining the optimal architecture and de
Externí odkaz:
http://arxiv.org/abs/2407.13143
Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution. Software and runtime systems for GPUs are optimized to better utilize the stream multiprocessors, on-c
Externí odkaz:
http://arxiv.org/abs/2407.13853
In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor model par
Externí odkaz:
http://arxiv.org/abs/2404.14632
Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recomm
Externí odkaz:
http://arxiv.org/abs/2404.04270
Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their m
Externí odkaz:
http://arxiv.org/abs/2403.11472
Autor:
Heo, Guseul, Lee, Sangyeop, Cho, Jaehong, Choi, Hyunmin, Lee, Sanghyeon, Ham, Hyungkyu, Kim, Gwangsun, Mahajan, Divya, Park, Jongse
Publikováno v:
ASPLOS 2024
Modern transformer-based Large Language Models (LLMs) are constructed with a series of decoder blocks. Each block comprises three key components: (1) QKV generation, (2) multi-head attention, and (3) feed-forward networks. In batched processing, QKV
Externí odkaz:
http://arxiv.org/abs/2403.00579
Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and ite
Externí odkaz:
http://arxiv.org/abs/2308.14902
Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environm
Externí odkaz:
http://arxiv.org/abs/2307.02623
Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines
Externí odkaz:
http://arxiv.org/abs/2204.05436
Autor:
Mahajan, Divya, author
Publikováno v:
Contemporary Studies of Risks in Emerging Technology, Part B