Improving execution unit occupancy on SMT-based processors through hardware-aware thread scheduling
Autor: | Achille Peternier, Danilo Ansaloni, Cesare Pautasso, Daniele Bonetta, Walter Binder |
---|---|
Rok vydání: | 2014 |
Předmět: |
010302 applied physics
Multi-core processor Schedule Speedup Computer Networks and Communications Computer science business.industry 02 engineering and technology Thread (computing) Simultaneous multithreading computer.software_genre 01 natural sciences 020202 computer hardware & architecture Super-threading Software Hardware and Architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Operating system Execution unit business computer Computer hardware |
Zdroj: | Future Generation Computer Systems. 30:229-241 |
ISSN: | 0167-739X |
DOI: | 10.1016/j.future.2013.06.015 |
Popis: | Modern processor architectures are increasingly complex and heterogeneous, often requiring software solutions tailored to the specific hardware characteristics of each processor model. In this article, we address this problem by targeting two processors featuring Simultaneous MultiThreading (SMT) to improve the occupancy of their internal execution units through a sustained stream of instructions coming from more than one thread. We target the AMD Bulldozer and IBM POWER7?processors as case studies for specific hardware-oriented performance optimizations that increase the variety of instructions sent to each core to maximize the occupancy of all its execution units. WorkOver, presented in this article, improves thread scheduling by increasing the performance of floating point-intensive workloads on Linux-based operating systems. WorkOver?is a user-space monitoring tool that automatically identifies FPU-intensive threads and schedules them in a more efficient way without requiring any patches or modifications at the kernel level. Our measurements using standard benchmark suites show that speedups of up to 20% can be achieved by simply allowing WorkOver?to monitor applications and schedule their threads, without any modification of the workload. We present WorkOver to improve thread-scheduling for better performance.We use performance counters to profile integer- and floating-point threads.Threads are scheduled according to hardware execution unit availability.WorkOver optimizes unit occupancy on AMD Bulldozer and IBM P7 processors.We measured up to 20% speedup using Spec CPU and Scimark 2.0. |
Databáze: | OpenAIRE |
Externí odkaz: |