3.2 The A100 Datacenter GPU and Ampere Architecture

Autor:	Jack Hilaire Choquette, Edward Lee, Vishnu Balan, Ronny Krashinsky, Brucek Khailany
Rok vydání:	2021
Předmět:	CPU cache business.industry Computer science Cloud gaming Cloud computing Memory bandwidth Parallel computing Virtualization computer.software_genre Rendering (computer graphics) CUDA Analytics business computer
Zdroj:	ISSCC
DOI:	10.1109/isscc42613.2021.9365803
Popis:	The diversity of compute-intensive applications in modern cloud data centers has driven the explosion of GPU-accelerated cloud computing. Such applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, and cloud gaming. The A100 GPU introduces several features targeting these workloads: a $3^{rd}-$generation Tensor Core with support for fine-grained sparsity, new BFIoat16 (BF16), TensorFIoat-32 (TF32), and FP64 datatypes, scale-out support with multi-instance GPU (MIG) virtualization, and scale-up support with a $3^{rd}-$generation 50Gbps NVLink I/0 interface (NVLink3) and NVSwitch inter-GPU communication. As shown in Fig. 3.2.1, A100 contains 108 Streaming Multiprocessors (SMs) and 6912 CUDA cores. The SMs are fed by a 40MB L2 cache and 1. 56TB/s of HBM2 memory bandwidth (BW). At 1.41GHz, A100 provides an effective peak 1248T0PS (8b integers), 624TFLOPS (FP16) and312TFLOPS (TF32) when including sparsity optimizations. Implemented in a TSMC 7nm N7 process, the A100 die (Fig. 3.2.7) contains 54B transistors and measures 826mm2.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::ee77780a461387a13fcf3f18fab3cf52 https://doi.org/10.1109/isscc42613.2021.9365803 Zobrazit plný text záznamu