Optimize datacenter management with multi-tier thermal-intelligent workload placement

Autor: Abishai Daniel, Nishi Ahuja, Chuan Song, Chun Wang, Xiang Zhou
Rok vydání: 2015
Předmět:
Zdroj: 2015 31st Thermal Measurement, Modeling & Management Symposium (SEMI-THERM).
DOI: 10.1109/semi-therm.2015.7100134
Popis: Rapid growth of internet services and mobile devices has led to more and larger cloud data centers. The hyper scale cloud data center consumes enormous amount of electricity and cause pressure to operation cost and infrastructure management. The industry has made great progress in improving power usage effectiveness through innovation and infrastructure upgrade. Recent research is focusing on dynamically adjusting workload placement according to realtime power and thermal telemetry of datacenter infrastructure to reduce the pressure to datacenter power and thermal as well as to improve datacenter power usage effectiveness (PUE), like thermal awareness scheduler (TAS). To achieve higher density and longer lifecycle of high value computing component, the conventional rack-mount server system is evolving to rack scale server system with power and cooling units moving to rack level to share with multi server systems, like Facebook-led Open Compute project and Project Scorpio developed by Chinese internet giants – Baidu, Alibaba and Tencent. The traditional thermal awareness workload placement assumes all server systems within clusters are uniform with discrete power units and cooling units, and there are no power and thermal correlation between different server systems. However, with power and cooling units moving to rack level, the power and thermal correlation between different server systems must be considered while calculating the optimal workload placement. To address these challenges, in this paper, we propose a framework of multi-tiers thermalintelligent workload placement and corresponding thermal management algorithms accustomed to rack scale server systems with shared power and cooling units. This paper evaluated these thermal management algorithms from performance, benefits, as well as their usage scenarios. The prototype and experiment introduced in this paper run over OpenStack managed cluster, but the thermal-intelligent workload placement and corresponding thermal management policies introduced by this paper aim to provide one common framework in addition to Cloud OS, like Big data software stacks, even customer's distributed computing and storage systems.
Databáze: OpenAIRE