TerrierTail: Mitigating Tail Latency of Cloud Virtual Machines

Autor: SeyedAlireza SanaeeKohroudi, Mohsen Sharifi, Azer Bestavros, Esmail Asyabi
Rok vydání: 2018
Předmět:
Zdroj: IEEE Transactions on Parallel and Distributed Systems. 29:2346-2359
ISSN: 2161-9883
1045-9219
DOI: 10.1109/tpds.2018.2827075
Popis: Large-scale online services parallelize sub-operations of a user’s request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tail of the latency distribution of service components has become a subject of intensive research. The key characteristics of clouds such as elasticity and on-demand resource provisioning have made clouds attractive for hosting large-scale online services wherein VMs are the building blocks of services. However, adherence to traditional hypervisor scheduling policies has led to unpredictable CPU access latencies for virtual CPUs (vCPUs) that are responsible for performing network IO processes. This has resulted in poor and unpredictable performance for network IO, exacerbating VMs’ long tail latencies and discouraging the hosting of large-scale parallel web services on virtualized clouds. This paper presents TerrierTail, a hypervisor CPU scheduler whose primary goal is to trim the tail of the latency distribution of individual VMs in virtualized clouds. In TerrierTail, we have modified the network driver to identify vCPUs that are responsible for performing network IO processes. Leveraging this information, the TerrierTail scheduler mitigates the CPU access latencies of such vCPUs using novel scheduling policies, resulting in a higher and more predictable network IO performance and therefore lower tail latency. TerrierTail’s gains come at no measurable negative impacts on other performance attributes (e.g., fairness) or on the performance of VMs running other types of workloads (e.g., CPU-intensive VMs). A prototype implementation of TerrierTail in the Xen hypervisor substantially outperforms the default Credit scheduler of Xen. For example, TerrierTail mitigates the tail latency of a Memcached server by up to 53 percent and an RPC server by up to 50 percent at 99.9th percentile.
Databáze: OpenAIRE