Straggler identification approach in large data processing frameworks using ensembled gradient boosting in smart-cities cloud services
Autor: | Shyam Deshmukh, Komati Thirupathi Rao |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | International Journal of System Assurance Engineering and Management. 13:146-155 |
ISSN: | 0976-4348 0975-6809 |
DOI: | 10.1007/s13198-021-01311-8 |
Popis: | A smart city's efficiency must be achieved by mining large amounts of data generated by cyber-physical systems and electronic platforms using the large-scale data processing framework in cloud environment. Many cloud services rely on data parallel computing frameworks in cloud environment, which runs on hundreds of interconnected nodes. These frameworks divide the computationally intensive and data-intensive tasks into smaller tasks and run them concurrently on different nodes to improve performance. But providing improved performance in the processing environment is a challenge due to runtime variability. Due to different internal and external factors, nodes running these tasks do not perform well, resulting in the delay in the execution of these jobs. As a result of the inherent complexity of runtime variability, preventive measures for stragglers proved inadequate, and the problem continued to affect compute workloads even after the measures were taken. Several researchers proposed dynamic straggler identification approaches based on historical log analysis. This paper analyzes the relationship between several parameters obtained during job execution that will aid us in formulating and detecting the stragglers. Using data analysis, we developed the straggler identification approach and labeled the generated dataset. To achieve high performance using statistical features of historical resource usage, the proposed approach trains distributed XGBoost classifier which showed highest accuracy of 88.57%. Furthermore, we have empirically shown that blacklisting predicted stragglers led to a significant reduction in CPU, I/O, and mixed application execution times. |
Databáze: | OpenAIRE |
Externí odkaz: |