Efficient straggler task management in cloud environment using stochastic gradient descent with momentum learning-driven neural networks.

Autor: Swain, Smruti Rekha, Parashar, Anshu, Singh, Ashutosh Kumar, Lee, Chung Nan
Předmět:
Zdroj: Cluster Computing; Jul2024, Vol. 27 Issue 4, p4673-4685, 13p
Abstrakt: In the modern era, large-scale computing systems distribute tasks into smaller units, allowing them to be executed simultaneously, accelerating job completion, and reducing energy usage. However, cloud computing systems face a significant challenge: the Long Tail problem. This problem arises when a small subset of slow-performing tasks impedes the overall progress of parallel job execution, resulting in longer service response times and decreased system efficiency. To reduce task execution time and energy consumption, we propose an efficient straggler task management framework for cloud data centers in this paper. A neural network-based resource predictor is initially developed and tuned with the Stochastic Gradient Descent with Momentum mechanism to analyze and classify heterogeneous tasks into stragglers and non-stragglers. Then, after identifying the straggler tasks, they are further classified into two categories: Resource Hunters and Long-Tail stragglers, based on their specific resource requirements. A task management policy is implemented to achieve parallelism and enhance sustainability in the cloud infrastructure. Considering the task category, this policy effectively schedules and allocates resources among user job requests. To evaluate the effectiveness of the proposed work, extensive simulations are performed using the Google Cluster Dataset (GCD). The results obtained from these simulations are subsequently compared to state-of-the-art techniques for a comprehensive analysis. The experimental results reveal substantial improvements in various metrics, including power consumption and active servers showing reductions of up to 55.16% and 35%, respectively. Furthermore, there has been a reduction in execution time of up to 67.74%. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index