Zobrazeno 1 - 10
of 44
pro vyhledávání: '"SINCLAIR, MATTHEW D."'
Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device's compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMM
Externí odkaz:
http://arxiv.org/abs/2409.02227
Large-scale computing systems are increasingly using accelerators such as GPUs to enable peta- and exa-scale levels of compute to meet the needs of Machine Learning (ML) and scientific computing applications. Given the widespread and growing use of M
Externí odkaz:
http://arxiv.org/abs/2408.11919
Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed t
Externí odkaz:
http://arxiv.org/abs/2401.16677
Autor:
Upasani, Gaurang, Sinclair, Matthew D., Sampson, Adrian, Ranganathan, Parthasarathy, Patterson, David, Shah, Shaan, Parthasarathy, Nidhi, Jain, Rutwik
Computer Architecture, broadly, involves optimizing hardware and software for current and future processing systems. Although there are several other top venues to publish Computer Architecture research, including ASPLOS, HPCA, and MICRO, ISCA (the I
Externí odkaz:
http://arxiv.org/abs/2306.03964
Accel-Sim is a widely used computer architecture simulator that models the behavior of modern NVIDIA GPUs in great detail. However, although Accel-Sim and the underlying GPGPU-Sim model many of the features of real GPUs, thus far it has not been able
Externí odkaz:
http://arxiv.org/abs/2304.11136
Scaling neural network models has delivered dramatic quality gains across ML problems. However, this scaling has increased the reliance on efficient distributed training techniques. Accordingly, as with other distributed computing scenarios, it is im
Externí odkaz:
http://arxiv.org/abs/2302.02825
Autor:
Sinha, Prasoon, Guliani, Akhil, Jain, Rutwik, Tran, Brandon, Sinclair, Matthew D., Venkataraman, Shivaram
Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procure
Externí odkaz:
http://arxiv.org/abs/2208.11035
Hardware specialization is becoming a key enabler of energyefficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, co
Externí odkaz:
http://arxiv.org/abs/2104.11678
Transfer learning in natural language processing (NLP), as realized using models like BERT (Bi-directional Encoder Representation from Transformer), has significantly improved language representation with models that can tackle challenging language p
Externí odkaz:
http://arxiv.org/abs/2104.08335
The ubiquity of deep neural networks (DNNs) continues to rise, making them a crucial application class for hardware optimizations. However, detailed profiling and characterization of DNN training remains difficult as these applications often run for
Externí odkaz:
http://arxiv.org/abs/2007.10459