Collaborative Learning Based Straggler Prevention in Large-Scale Distributed Computing Framework

Autor:	Shyam Deshmukh, Mohammad Shabaz, Komati Thirupathi Rao
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Science (General) Article Subject Computer Networks and Communications business.industry Computer science Distributed computing Big data 020206 networking & telecommunications Collaborative learning Workload 02 engineering and technology Bottleneck Q1-390 Software 020204 information systems Computer cluster Node (computer science) 0202 electrical engineering electronic engineering information engineering T1-995 Performance improvement business Technology (General) Information Systems
Zdroj:	Security and Communication Networks, Vol 2021 (2021)
ISSN:	1939-0114
DOI:	10.1155/2021/8340925
Popis:	Modern big data applications tend to prefer a cluster computing approach as they are linked to the distributed computing framework that serves users jobs as per demand. It performs rapid processing of tasks by subdividing them into tasks that execute in parallel. Because of the complex environment, hardware and software issues, tasks might run slowly leading to delayed job completion, and such phenomena are also known as stragglers. The performance improvement of distributed computing framework is a bottleneck by straggling nodes due to various factors like shared resources, heavy system load, or hardware issues leading to the prolonged job execution time. Many state-of-the-art approaches use independent models per node and workload. With increased nodes and workloads, the number of models would increase, and even with large numbers of nodes. Not every node would be able to capture the stragglers as there might not be sufficient training data available of straggler patterns, yielding suboptimal straggler prediction. To alleviate such problems, we propose a novel collaborative learning-based approach for straggler prediction, the alternate direction method of multipliers (ADMM), which is resource-efficient and learns how to efficiently deal with mitigating stragglers without moving data to a centralized location. The proposed framework shares information among the various models, allowing us to use larger training data and bring training time down by avoiding data transfer. We rigorously evaluate the proposed method on various datasets with high accuracy results.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e8caebd4f2f624d738f0d756a980e2f3 Zobrazit plný text záznamu Plný text