Enabling Large-Reach TLBs for High-Throughput Processors by Exploiting Memory Subregion Contiguity

Autor:	Yu, Chao, Bai, Yuebin, Wang, Rui
Rok vydání:	2021
Předmět:	Computer Science - Hardware Architecture
Druh dokumentu:	Working Paper
Popis:	Accelerators, like GPUs, have become a trend to deliver future performance desire, and sharing the same virtual memory space between CPUs and GPUs is increasingly adopted to simplify programming. However, address translation, which is the key factor of virtual memory, is becoming the bottleneck of performance for GPUs. In GPUs, a single TLB miss can stall hundreds of threads due to the SIMT execute model, degrading performance dramatically. Through real system analysis, we observe that the OS shows an advanced contiguity (e.g., hundreds of contiguous pages), and more large memory regions with advanced contiguity tend to be allocated with the increase of working sets. Leveraging the observation, we propose MESC to improve the translation efficiency for GPUs. The key idea of MESC is to divide each large page frame (2MB size) in virtual memory space into memory subregions with fixed size (i.e., 64 4KB pages), and store the contiguity information of subregions and large page frames in L2PTEs. With MESC, address translations of up to 512 pages can be coalesced into single TLB entry, without the needs of changing memory allocation policy (i.e., demand paging) and the support of large pages. In the experimental results, MESC achieves 77.2% performance improvement and 76.4% reduction in dynamic translation energy for translation-sensitive workloads.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2110.08613 Zobrazit plný text záznamu View this record from Arxiv