An Optimized Straggler Mitigation Framework for Large-Scale Distributed Computing Systems

Autor: Samar A. Said, Shahira M. Habashy, Sameh A. Salem, Elsayed M. Saad
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: IEEE Access, Vol 10, Pp 97075-97088 (2022)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2022.3205723
Popis: Nowadays, Big Data becomes a research focus in industrial, banking, social network, and other fields. In addition, the explosive increase of data and information require efficient processing solutions. Therefore, Spark is considered as a promising candidate of Large-Scale Distributed Computing Systems for big data processing. One primary challenge is the straggler problem that occurred due to the presence of heterogeneity where a machine takes an extra-long time to finish execution of a task, which decreases the system throughput. To mitigate straggler tasks, Spark adopts speculative execution mechanism, in which the scheduler launches additional backup to avoid slow task processing and achieve acceleration. In this paper, a new Optimized Straggler Mitigation Framework is proposed. The proposed framework uses a dynamic criterion to determine the closest straggler tasks. This criterion is based on multiple coefficients to achieve a reliable straggler decision. Also, it integrates the historical data analysis and online adaptation for intelligent straggler judgment. This guarantees the effectiveness of speculative tasks by improving cluster performance. Experimental results on various benchmarks and applications show that the proposed framework achieves 23.5% to 30.7% execution time reductions, and 25.4 to 46.3% increase of the cluster throughputs compared with spark engine.
Databáze: Directory of Open Access Journals