Zobrazeno 1 - 10
of 10
pro vyhledávání: '"tyagi, Sahil"'
Autor:
Tyagi, Sahil, Swany, Martin
Publikováno v:
2023 IEEE International Conference on Big Data (BigData), 925-935
Gradient compression alleviates expensive communication in distributed deep learning by sending fewer values and its corresponding indices, typically via Allgather (AG). Training with high compression ratio (CR) achieves high accuracy like DenseSGD,
Externí odkaz:
http://arxiv.org/abs/2312.02493
Autor:
Tyagi, Sahil, Swany, Martin
Publikováno v:
Tyagi, S., & Swany, M. (2023). Accelerating Distributed ML Training via Selective Synchronization. 2023 IEEE International Conference on Cluster Computing (CLUSTER), 1-12
In distributed training, deep neural networks (DNNs) are launched over multiple workers concurrently and aggregate their local updates on each step in bulk-synchronous parallel (BSP) training. However, BSP does not linearly scale-out due to high comm
Externí odkaz:
http://arxiv.org/abs/2307.07950
Autor:
Tyagi, Sahil, Sharma, Prateek
Current techniques and systems for distributed model training mostly assume that clusters are comprised of homogeneous servers with a constant resource availability. However, cluster heterogeneity is pervasive in computing infrastructure, and is a fu
Externí odkaz:
http://arxiv.org/abs/2305.12213
Autor:
Tyagi, Sahil, Swany, Martin
Publikováno v:
Tyagi, S., & Swany, M. (2023). GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training. 2023 IEEE 16th International Conference on Cloud Computing (CLOUD), 319-329
Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at each iteration incurs considerabl
Externí odkaz:
http://arxiv.org/abs/2305.12201
Autor:
Tyagi, Sahil, Sharma, Prateek
Publikováno v:
2023 IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
While the pay-as-you-go nature of cloud virtual machines (VMs) makes it easy to spin-up large clusters for training ML models, it can also lead to ballooning costs. The 100s of virtual machine sizes provided by cloud platforms also makes it extremely
Externí odkaz:
http://arxiv.org/abs/2303.06659
Autor:
Tyagi, Sahil, Swany, Martin
Publikováno v:
Tyagi, S., & Swany, M. (2022). ScaDLES: Scalable Deep Learning over Streaming data at the Edge. 2022 IEEE International Conference on Big Data (Big Data), 2113-2122
Distributed deep learning (DDL) training systems are designed for cloud and data-center environments that assumes homogeneous compute resources, high network bandwidth, sufficient memory and storage, as well as independent and identically distributed
Externí odkaz:
http://arxiv.org/abs/2301.08897
Autor:
katyal, Garima, Pathak, Anuj, Raghavendra Rao, N.G., Grover, Parul, Sharma, Vaibhav, Malik, Anshika, tyagi, Sahil, Rawat, Aryan Prakash, Singh, Sanjay, Maurya, Aarati
Publikováno v:
In Materials Today: Proceedings 2024 103:423-431
Publikováno v:
Proceedings of the 13th IEEE International Conference on e-Science (e-Science), Auckland, New Zealand, 2017
Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data f
Externí odkaz:
http://arxiv.org/abs/1709.03332
Autor:
Tyagi, Sahil, Swany, Martin
Publikováno v:
2022 IEEE International Conference on Big Data (Big Data).
Distributed deep learning (DDL) training systems are designed for cloud and data-center environments that assumes homogeneous compute resources, high network bandwidth, sufficient memory and storage, as well as independent and identically distributed
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.