Výsledky vyhledávání - "Ganger, Gregory"

Report

PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

Autor: Arfeen, Daiyaan, Zhang, Zhen, Fu, Xinwei, Ganger, Gregory R., Wang, Yida

Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use GPUs inefficiently, especially at large scale, due to idle GPU time caused by pipeline bubb

Externí odkaz: http://arxiv.org/abs/2410.07192

Zobrazit plný text záznamu

Report

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Autor: Jeon, Byungsoo, Wu, Mengdi, Cao, Shiyi, Kim, Sunghyun, Park, Sunghyun, Aggarwal, Neeraj, Unger, Colin, Arfeen, Daiyaan, Liao, Peiyuan, Miao, Xupeng, Alizadeh, Mohammad, Ganger, Gregory R., Chen, Tianqi, Jia, Zhihao

Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple st

Externí odkaz: http://arxiv.org/abs/2406.17145

Zobrazit plný text záznamu

Report

PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy

Autor: Kadekodi, Saurabh, Maturana, Francisco, Subramanya, Suhas Jayaram, Yang, Juncheng, Rashmi, K. V., Ganger, Gregory R.

Publikováno v: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, (pp. 369-385)

Data redundancy provides resilience in large-scale storage clusters, but imposes significant cost overhead. Substantial space-savings can be realized by tuning redundancy schemes to observed disk failure rates. However, prior design proposals for suc

Externí odkaz: http://arxiv.org/abs/2103.08191

Zobrazit plný text záznamu

Report

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Autor: Qiao, Aurick, Choe, Sang Keun, Subramanya, Suhas Jayaram, Neiswanger, Willie, Ho, Qirong, Zhang, Hao, Ganger, Gregory R., Xing, Eric P.

Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Most existing schedulers expect users to specify the number of resource

Externí odkaz: http://arxiv.org/abs/2008.12260

Zobrazit plný text záznamu

Report

Vilamb: Low Overhead Asynchronous Redundancy for Direct Access NVM

Autor: Kateja, Rajat, Pavlo, Andy, Ganger, Gregory R.

Vilamb provides efficient asynchronous systemredundancy for direct access (DAX) non-volatile memory (NVM) storage. Production storage deployments often use system-redundancy in form of page checksums and cross-page parity. State-of-the-art solutions

Externí odkaz: http://arxiv.org/abs/2004.09619

Zobrazit plný text záznamu

Report

Accelerating Deep Learning by Focusing on the Biggest Losers

Autor: Jiang, Angela H., Wong, Daniel L. -K., Zhou, Giulio, Andersen, David G., Dean, Jeffrey, Ganger, Gregory R., Joshi, Gauri, Kaminksy, Michael, Kozuch, Michael, Lipton, Zachary C., Pillai, Padmanabhan

This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to d

Externí odkaz: http://arxiv.org/abs/1910.00762

Zobrazit plný text záznamu

Report

Tvarak: Software-managed hardware offload for DAX NVM storage redundancy

Autor: Kateja, Rajat, Beckmann, Nathan, Ganger, Gregory R.

Tvarak efficiently implements system-level redundancy for direct-access (DAX) NVM storage. Production storage systems complement device-level ECC (which covers media errors) with system-checksums and cross-device parity. This system-level redundancy

Externí odkaz: http://arxiv.org/abs/1908.09922

Zobrazit plný text záznamu

Report

MLSys: The New Frontier of Machine Learning Systems

Autor: Ratner, Alexander, Alistarh, Dan, Alonso, Gustavo, Andersen, David G., Bailis, Peter, Bird, Sarah, Carlini, Nicholas, Catanzaro, Bryan, Chayes, Jennifer, Chung, Eric, Dally, Bill, Dean, Jeff, Dhillon, Inderjit S., Dimakis, Alexandros, Dubey, Pradeep, Elkan, Charles, Fursin, Grigori, Ganger, Gregory R., Getoor, Lise, Gibbons, Phillip B., Gibson, Garth A., Gonzalez, Joseph E., Gottschlich, Justin, Han, Song, Hazelwood, Kim, Huang, Furong, Jaggi, Martin, Jamieson, Kevin, Jordan, Michael I., Joshi, Gauri, Khalaf, Rania, Knight, Jason, Konečný, Jakub, Kraska, Tim, Kumar, Arun, Kyrillidis, Anastasios, Lakshmiratan, Aparna, Li, Jing, Madden, Samuel, McMahan, H. Brendan, Meijer, Erik, Mitliagkas, Ioannis, Monga, Rajat, Murray, Derek, Olukotun, Kunle, Papailiopoulos, Dimitris, Pekhimenko, Gennady, Rekatsinas, Theodoros, Rostamizadeh, Afshin, Ré, Christopher, De Sa, Christopher, Sedghi, Hanie, Sen, Siddhartha, Smith, Virginia, Smola, Alex, Song, Dawn, Sparks, Evan, Stoica, Ion, Sze, Vivienne, Udell, Madeleine, Vanschoren, Joaquin, Venkataraman, Shivaram, Vinayak, Rashmi, Weimer, Markus, Wilson, Andrew Gordon, Xing, Eric, Zaharia, Matei, Zhang, Ce, Talwalkar, Ameet

Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different d

Externí odkaz: http://arxiv.org/abs/1904.03257

Zobrazit plný text záznamu

Report

MLtuner: System Support for Automatic Machine Learning Tuning

Autor: Cui, Henggang, Ganger, Gregory R., Gibbons, Phillip B.

MLtuner automatically tunes settings for training tunables (such as the learning rate, the momentum, the mini-batch size, and the data staleness bound) that have a significant impact on large-scale machine learning (ML) performance. Traditionally, th

Externí odkaz: http://arxiv.org/abs/1803.07445

Zobrazit plný text záznamu

Periodical

The process-flow model

Autor: Ganger, Gregory R., Patt, Yale N.

Publikováno v: ACM SIGMETRICS Performance Evaluation Review; 20240101, Issue: Preprints p86-97, 12p

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání