CTS: An operating system CPU scheduler to mitigate tail latency for latency-sensitive multi-threaded applications

Autor: Esmail Asyabi, SeyedAlireza SanaeeKohroudi, Mohsen Sharifi, Erfan Sharafzadeh
Rok vydání: 2019
Předmět:
Zdroj: Journal of Parallel and Distributed Computing. 133:232-243
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2018.04.003
Popis: Large-scale interactive Web services break a user’s request to many sub-requests and send them to a large number of independent servers so as to consult multi-terabyte datasets instantaneously. Service responsiveness hinges on the slowest server, making the tail of the latency distribution of individual servers a matter of great concern. A large number of latency-sensitive applications hosted on individual servers use thread-driven concurrency model wherein a thread is spawned for each user connection. Threaded applications rely on the operating system CPU scheduler for determining the order of thread execution. Our experiments show that the default Linux scheduler (CFS) idiosyncrasies result in LCFS (Last Come First Served) scheduling of threads belonging to the same application. On the other hand, studies have shown that FCFS (First Come First Served) scheduling yields the lowest response time variability and tail latency, making the default scheduler of Linux a source of long tail latency for multi-threaded applications. In this paper, we present CTS, an operating system CPU scheduler to trim the tail of the latency distribution for latency-sensitive multi-threaded applications while maintaining the key characteristics of the default Linux scheduler (e.g., fairness). By adding new data structures to the Linux kernel, CTS tracks threads belonging to an application in a timely manner and schedules them in FCFS manner, mitigating the tail latency. To keep the existing features of the default Linux scheduler intact, CTS keeps CFS responsible for system-wide load balancing and core level process scheduling; CTS merely schedules threads of the CFS chosen process in FCFS order, ensuring tail latency mitigation without sacrificing the default Linux scheduler properties. Experiments with a prototype implementation of CTS in the Linux kernel demonstrate that CTS significantly outperforms the Linux default scheduler. For example, CTS mitigates the tail latency of a Null RPC server by up to 96%, a Thrift server by up to 90% and an Apache Web server by up to 51% at 99.9th percentile.
Databáze: OpenAIRE