Improvement of job completion time in data-intensive cloud computing applications
Autor: | Mostafa A. Bassiouni, Ibrahim Adel Ibrahim |
---|---|
Rok vydání: | 2020 |
Předmět: |
Scheme (programming language)
lcsh:Computer engineering. Computer hardware Computer Networks and Communications Computer science Distributed computing lcsh:TK7885-7895 Cloud computing 02 engineering and technology Straggler reduce task lcsh:QA75.5-76.95 Parallel and distributed processing 0202 electrical engineering electronic engineering information engineering Range (statistics) MapReduce Sampling Data-intensive computing computer.programming_language 020203 distributed computing business.industry Skew Sampling (statistics) Yarn Task (computing) visual_art visual_art.visual_art_medium 020201 artificial intelligence & image processing lcsh:Electronic computers. Computer science Completion time business computer Software |
Zdroj: | Journal of Cloud Computing: Advances, Systems and Applications, Vol 9, Iss 1, Pp 1-20 (2020) |
ISSN: | 2192-113X |
DOI: | 10.1186/s13677-019-0139-6 |
Popis: | Task stragglers in MapReduce jobs dramatically impede job execution of data-intensive computing in cloud data centers. This impedance is due to the uneven distribution of input data, heterogeneous data nodes, resource contention situations, and network configurations. Data skew of intermediate data in MapReduce job causes delay failures due to the violation of job completion time. Data-intensive computing frameworks, such as MapReduce or Hadoop YARN, employ HashPartitioner. This partitioner may cause intermediate data skew, which results in straggler reducers. In this paper, we strive to make Hadoop YARN more efficient in cloud environments. We present, a new partitioning scheme, called balanced data clusters partitioner (BDCP), to handle straggler Reduce tasks based on sampling of input data and feedback information about the current processing task. Our extensive experimental results show that BDCP can outperform the default Hadoop HashPartitioner and Range partitioner. BDCP can assist in straggler mitigation during reduce phase and minimize the job completion time in MapReduce jobs within data-intensive cloud computing. |
Databáze: | OpenAIRE |
Externí odkaz: |