Node Capability Modeling for Reduce Phase’s Scheduling in MapReduce Environment

Autor: Tao Li, Tao Gu, Qun Liao, Chuang Zuo, Yulu Yang
Rok vydání: 2015
Předmět:
Zdroj: Cloud Computing and Big Data ISBN: 9783319284293
CloudCom-Asia
Popis: MapReduce is a programming model widely used in big data processing. Reduce tasks scheduling in MapReduce is a key issue which affect the performance significantly. Unfortunately, because of the complication of reduce tasks scheduling, there are no acknowledged solution in this issue. Main ideas in optimizing reduce tasks scheduling emphasizes features of computation or data locality. Although few researches tried to explore solutions with theoretical modeling, their models are oversimplified. Aiming to optimizing reduce tasks scheduling, we propose a method of modeling node's computation and communication capability uniformly based on analyzing the procedure of reduce phase theoretically. In the analysis, cost of reduce tasks in intermediate data fetching and processing are integrated. With the proposed model, the optimal load balance of reduce phase is concluded and proved. Evaluations under different environments show that load balance of reduce phase is improved significantly with the scheduling method instructed by the optimal principle.
Databáze: OpenAIRE