Výsledky vyhledávání - "YOUNG, CLIFF"

Report

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Autor: Gale, Trevor, Narayanan, Deepak, Young, Cliff, Zaharia, Matei

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing softwar

Externí odkaz: http://arxiv.org/abs/2211.15841

Zobrazit plný text záznamu

Kniha

Drawing Drapery from Head to Toe. [elektronicky zdroj]

Autor: Young, Cliff

Externí odkaz: Kolekce e-knih KNAV (Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.)

Report

Exploring the limits of Concurrency in ML Training on Google TPUs

Autor: Kumar, Sameer, Bradbury, James, Young, Cliff, Wang, Yu Emma, Levskaya, Anselm, Hechtman, Blake, Chen, Dehao, Lee, HyoukJoong, Deveci, Mehmet, Kumar, Naveen, Kanwar, Pankaj, Wang, Shibo, Wanderman-Milne, Skye, Lacy, Steve, Wang, Tao, Oguntebi, Tayo, Zu, Yazhou, Xu, Yuanzhong, Swing, Andy

Recent results in language understanding using neural networks have required training hardware of unprecedentedscale, with thousands of chips cooperating on a single training run. This paper presents techniques to scaleML models on the Google TPU Mul

Externí odkaz: http://arxiv.org/abs/2011.03641

Zobrazit plný text záznamu

Report

Sparse GPU Kernels for Deep Learning

Autor: Gale, Trevor, Zaharia, Matei, Young, Cliff, Elsen, Erich

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applica

Externí odkaz: http://arxiv.org/abs/2006.10901

Zobrazit plný text záznamu

Report

Bit-Parallel Vector Composability for Neural Acceleration

Autor: Ghodrati, Soroush, Sharma, Hardik, Young, Cliff, Kim, Nam Sung, Esmaeilzadeh, Hadi

Conventional neural accelerators rely on isolated self-sufficient functional units that perform an atomic operation while communicating the results through an operand delivery-aggregation logic. Each single unit processes all the bits of their operan

Externí odkaz: http://arxiv.org/abs/2004.05333

Zobrazit plný text záznamu

Report

MLPerf Training Benchmark

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from o

Externí odkaz: http://arxiv.org/abs/1910.01500

Zobrazit plný text záznamu

Report

Mesh-TensorFlow: Deep Learning for Supercomputers

Autor: Shazeer, Noam, Cheng, Youlong, Parmar, Niki, Tran, Dustin, Vaswani, Ashish, Koanantakool, Penporn, Hawkins, Peter, Lee, HyoukJoong, Hong, Mingsheng, Young, Cliff, Sepassi, Ryan, Hechtman, Blake

Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting suffers fr

Externí odkaz: http://arxiv.org/abs/1811.02084

Zobrazit plný text záznamu

Report

In-Datacenter Performance Analysis of a Tensor Processing Unit

Autor: Jouppi, Norman P., Young, Cliff, Patil, Nishant, Patterson, David, Agrawal, Gaurav, Bajwa, Raminder, Bates, Sarah, Bhatia, Suresh, Boden, Nan, Borchers, Al, Boyle, Rick, Cantin, Pierre-luc, Chao, Clifford, Clark, Chris, Coriell, Jeremy, Daley, Mike, Dau, Matt, Dean, Jeffrey, Gelb, Ben, Ghaemmaghami, Tara Vazir, Gottipati, Rajendra, Gulland, William, Hagmann, Robert, Ho, C. Richard, Hogberg, Doug, Hu, John, Hundt, Robert, Hurt, Dan, Ibarz, Julian, Jaffey, Aaron, Jaworski, Alek, Kaplan, Alexander, Khaitan, Harshit, Koch, Andy, Kumar, Naveen, Lacy, Steve, Laudon, James, Law, James, Le, Diemthu, Leary, Chris, Liu, Zhuyuan, Lucke, Kyle, Lundin, Alan, MacKean, Gordon, Maggiore, Adriana, Mahony, Maire, Miller, Kieran, Nagarajan, Rahul, Narayanaswami, Ravi, Ni, Ray, Nix, Kathy, Norrie, Thomas, Omernick, Mark, Penukonda, Narayana, Phelps, Andy, Ross, Jonathan, Ross, Matt, Salek, Amir, Samadiani, Emad, Severn, Chris, Sizikov, Gregory, Snelham, Matthew, Souter, Jed, Steinberg, Dan, Swing, Andy, Tan, Mercedes, Thorson, Gregory, Tian, Bo, Toma, Horia, Tuttle, Erick, Vasudevan, Vijay, Walter, Richard, Wang, Walter, Wilcox, Eric, Yoon, Doe Hyun

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates t

Externí odkaz: http://arxiv.org/abs/1704.04760

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání