Exploring Multilevel Cache Hierarchies in Application Specific MPSoCs

Autor:	Isuru Nawinne, Roshan Ragel, Swarnalatha Radhakrishnan, Haris Javaid, Sri Parameswaran
Rok vydání:	2015
Předmět:	Snoopy cache Hardware_MEMORYSTRUCTURES Computer science Cache coloring CPU cache Multiprocessing Parallel computing Cache pollution Cache-oblivious algorithm Computer Graphics and Computer-Aided Design MESIF protocol Smart Cache Tag RAM Cache invalidation Write-once Bus sniffing Page cache Cache Electrical and Electronic Engineering Cache hierarchy Average memory access time Cache algorithms Software Cache coherence
Zdroj:	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 34:1991-2003
ISSN:	1937-4151 0278-0070
DOI:	10.1109/tcad.2015.2445736
Popis:	Multiprocessor systems make use of multilevel cache hierarchies to improve overall memory access speed. Embedded systems typically use configurable processors, where the caches in the system can be customized for a given application or a set of applications. Finding the optimal or a near-optimal set size, block size, and associativity of each of the caches in a multilevel cache hierarchy is a challenging task due to the presence of billions or even trillions of design points. This paper presents an iterative exploration method to find suitable configurations for all the caches in the hierarchy of an application specific multiprocessor system-on-chip, to improve memory access speed. We propose an algorithm and combine it with the use of specialized hardware for parallel cache simulation to enable multiple back-and-forth iterations through the cache levels. In every iteration, our algorithm explores selected portions of the entire design space to quickly converge upon the final design point. We demonstrate our methodology on two- and three-level cache hierarchies with private and shared caches in a quad-core system, respectively, consisting of 5.4 billion and 10.4 trillion design points. Our method was able to find design points with up to 18.9% lower average memory access time while reducing total cache size by up to 74.15%, compared to a state-of-the-art noniterative method. The number of design points explored was $ {4\times }$ higher in our method, which is still a mere $ {3.6\times 10}^{ {-5}}$ % of the entire design space, and took 6.08 h.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5b97561b43fa0f1695bcce98ad77c65c https://doi.org/10.1109/tcad.2015.2445736 Zobrazit plný text záznamu