Learn-as-you-go with Megh: Efficient Live Migration of Virtual Machines
Autor: | Stéphane Bressan, Debabrota Basu, Haibo Chen, Yang Hong, Xiayang Wang |
---|---|
Rok vydání: | 2019 |
Předmět: |
Theoretical computer science
Computer science Distributed computing Cloud computing 02 engineering and technology computer.software_genre Scheduling (computing) PlanetLab 020204 information systems 0202 electrical engineering electronic engineering information engineering Reinforcement learning Resource management Online algorithm 020203 distributed computing business.industry Workload Energy consumption Computational Theory and Mathematics Hardware and Architecture Virtual machine Signal Processing Scalability CloudSim Markov decision process Heuristics business computer Live migration |
Zdroj: | ICDCS |
ISSN: | 2161-9883 1045-9219 |
DOI: | 10.1109/tpds.2019.2893648 |
Popis: | Cloud providers leverage live migration of virtual machines to reduce energy consumption and allocate resources efficiently in data centers. Each migration decision depends on three questions: when to move a virtual machine, which virtual machine to move and where to move it? Dynamic, uncertain, and heterogeneous workloads running on virtual machines make such decisions difficult. Knowledge-based and heuristics-based algorithms are commonly used to tackle this problem. Knowledge-based algorithms, such as MaxWeight scheduling algorithms, are dependent on the specifics and the dynamics of the targeted Cloud architectures and applications. Heuristics-based algorithms, such as MMT algorithms, suffer from high variance and poor convergence because of their greedy approach. We propose an online reinforcement learning algorithm called Megh. Megh does not require prior knowledge of the workload rather learns the dynamics of workloads as-it-goes. Megh models the problem of energy- and performance-efficient resource management during live migration as a Markov decision process and solves it using a functional approximation scheme. While several reinforcement learning algorithms are proposed to solve this problem, these algorithms remain confined to the academic realm as they face the curse of dimensionality. They are either not scalable in real-time, as it is the case of MadVM, or need an elaborate offline training, as it is the case of Q-learning. These algorithms often incur execution overheads which are comparable with the migration time of a VM. Megh overcomes these deficiencies. Megh uses a novel dimensionality reduction scheme to project the combinatorially explosive state-action space to a polynomial dimensional space with a sparse basis. Megh has the capacity to learn uncertain dynamics and the ability to work in real-time without incurring significant execution overhead. Megh is both scalable and robust. We implement Megh using the CloudSim toolkit and empirically evaluate its performance with the PlanetLab and the Google Cluster workloads. Experiments validate that Megh is more cost-effective, converges faster, incurs smaller execution overhead and is more scalable than MadVM and MMT. An empirical sensitivity analysis explicates the choice of parameters in experiments. |
Databáze: | OpenAIRE |
Externí odkaz: |