Architectural support for efficient message passing on shared memory multi-cores
Autor: | Osman Unsal, Adrian Cristal, Rubén Titos-Gil, Oscar Palomar |
---|---|
Rok vydání: | 2016 |
Předmět: |
010302 applied physics
020203 distributed computing Multi-core processor Coprocessor Computer Networks and Communications Computer science Message passing 02 engineering and technology computer.software_genre 01 natural sciences Theoretical Computer Science Instruction set Shared memory Artificial Intelligence Hardware and Architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Operating system Cache Central processing unit computer Software Efficient energy use |
Zdroj: | Journal of Parallel and Distributed Computing. 95:92-106 |
ISSN: | 0743-7315 |
Popis: | Thanks to programming approaches like actor-based models, message passing is regaining popularity outside large-scale scientific computing for building scalable distributed applications in multi-core processors. Unfortunately, the mismatch between message passing models and today's shared-memory hardware provided by commercial vendors results in suboptimal performance and a waste of energy. This paper presents a set of architectural extensions to reduce the overheads incurred by message passing workloads running on shared memory multi-core architectures. It describes the instruction set extensions and the hardware implementation. In order to facilitate programmability, the proposed extensions are used by a message passing library, allowing programs to take advantage of them transparently. As a proof-of-concept, we use modified MPI libraries and unmodified MPI programs to evaluate the proposal. Experimental results show that a best-effort design can eliminate over 60% of cache accesses caused by message data transmission and reduce the cycles spent in such task by 75%, while the addition of a simple coprocessor can completely off-load data movement from the CPU to avoid up to 92% of cache accesses, and a reduction of 12% of network traffic on average. The design achieves an improvement of 11%-12% in the energy-delay product of on-chip caches. We present hardware support to reduce overheads incurred by message passing (MP).We modified an MPI library to add support for our ISA extensions.Our design eliminates over 60%-92% of cache accesses during data transfer.Adding simple MP support to shared memory multicores improves energy efficiency. |
Databáze: | OpenAIRE |
Externí odkaz: |