Zobrazeno 1 - 10
of 206
pro vyhledávání: '"Ganger, Gregory"'
Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use GPUs inefficiently, especially at large scale, due to idle GPU time caused by pipeline bubb
Externí odkaz:
http://arxiv.org/abs/2410.07192
Autor:
Jeon, Byungsoo, Wu, Mengdi, Cao, Shiyi, Kim, Sunghyun, Park, Sunghyun, Aggarwal, Neeraj, Unger, Colin, Arfeen, Daiyaan, Liao, Peiyuan, Miao, Xupeng, Alizadeh, Mohammad, Ganger, Gregory R., Chen, Tianqi, Jia, Zhihao
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple st
Externí odkaz:
http://arxiv.org/abs/2406.17145
Autor:
Kadekodi, Saurabh, Maturana, Francisco, Subramanya, Suhas Jayaram, Yang, Juncheng, Rashmi, K. V., Ganger, Gregory R.
Publikováno v:
14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, (pp. 369-385)
Data redundancy provides resilience in large-scale storage clusters, but imposes significant cost overhead. Substantial space-savings can be realized by tuning redundancy schemes to observed disk failure rates. However, prior design proposals for suc
Externí odkaz:
http://arxiv.org/abs/2103.08191
Autor:
Qiao, Aurick, Choe, Sang Keun, Subramanya, Suhas Jayaram, Neiswanger, Willie, Ho, Qirong, Zhang, Hao, Ganger, Gregory R., Xing, Eric P.
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Most existing schedulers expect users to specify the number of resource
Externí odkaz:
http://arxiv.org/abs/2008.12260
Vilamb provides efficient asynchronous systemredundancy for direct access (DAX) non-volatile memory (NVM) storage. Production storage deployments often use system-redundancy in form of page checksums and cross-page parity. State-of-the-art solutions
Externí odkaz:
http://arxiv.org/abs/2004.09619
Autor:
Jiang, Angela H., Wong, Daniel L. -K., Zhou, Giulio, Andersen, David G., Dean, Jeffrey, Ganger, Gregory R., Joshi, Gauri, Kaminksy, Michael, Kozuch, Michael, Lipton, Zachary C., Pillai, Padmanabhan
This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to d
Externí odkaz:
http://arxiv.org/abs/1910.00762
Tvarak efficiently implements system-level redundancy for direct-access (DAX) NVM storage. Production storage systems complement device-level ECC (which covers media errors) with system-checksums and cross-device parity. This system-level redundancy
Externí odkaz:
http://arxiv.org/abs/1908.09922
Autor:
Ratner, Alexander, Alistarh, Dan, Alonso, Gustavo, Andersen, David G., Bailis, Peter, Bird, Sarah, Carlini, Nicholas, Catanzaro, Bryan, Chayes, Jennifer, Chung, Eric, Dally, Bill, Dean, Jeff, Dhillon, Inderjit S., Dimakis, Alexandros, Dubey, Pradeep, Elkan, Charles, Fursin, Grigori, Ganger, Gregory R., Getoor, Lise, Gibbons, Phillip B., Gibson, Garth A., Gonzalez, Joseph E., Gottschlich, Justin, Han, Song, Hazelwood, Kim, Huang, Furong, Jaggi, Martin, Jamieson, Kevin, Jordan, Michael I., Joshi, Gauri, Khalaf, Rania, Knight, Jason, Konečný, Jakub, Kraska, Tim, Kumar, Arun, Kyrillidis, Anastasios, Lakshmiratan, Aparna, Li, Jing, Madden, Samuel, McMahan, H. Brendan, Meijer, Erik, Mitliagkas, Ioannis, Monga, Rajat, Murray, Derek, Olukotun, Kunle, Papailiopoulos, Dimitris, Pekhimenko, Gennady, Rekatsinas, Theodoros, Rostamizadeh, Afshin, Ré, Christopher, De Sa, Christopher, Sedghi, Hanie, Sen, Siddhartha, Smith, Virginia, Smola, Alex, Song, Dawn, Sparks, Evan, Stoica, Ion, Sze, Vivienne, Udell, Madeleine, Vanschoren, Joaquin, Venkataraman, Shivaram, Vinayak, Rashmi, Weimer, Markus, Wilson, Andrew Gordon, Xing, Eric, Zaharia, Matei, Zhang, Ce, Talwalkar, Ameet
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different d
Externí odkaz:
http://arxiv.org/abs/1904.03257
MLtuner automatically tunes settings for training tunables (such as the learning rate, the momentum, the mini-batch size, and the data staleness bound) that have a significant impact on large-scale machine learning (ML) performance. Traditionally, th
Externí odkaz:
http://arxiv.org/abs/1803.07445
Autor:
Ganger, Gregory R., Patt, Yale N.
Publikováno v:
ACM SIGMETRICS Performance Evaluation Review; 20240101, Issue: Preprints p86-97, 12p