Tula: A disk latency aware balancing and block placement strategy for Hadoop

Autor: Srikant Padala, Vikram Kumar, Balaji Kammili, Janakiram Dharanipragada
Rok vydání: 2017
Předmět:
Zdroj: IEEE BigData
Popis: Heterogeneity could occur due to various reasons in a Hadoop cluster. This work primarily focuses on heterogeneity occurring due to varying read/write latency of disks. In case of similar disk latencies among the datanodes, balancing the blocks uniformly is a suitable choice. However, with time, the disk latencies can increase due to mechanical problems and bad sectors. Further, the disks which crash and become non-functional are replaced with newer disks which could be of newer generation and can have greater RPM. This leads to heterogeneity in terms of disk in the cluster which is otherwise homogeneous, and balancing uniformly according to disk utilization may not give optimal job runtime. To address this issue we propose a disk latency aware balancer, which balances the cluster taking both disk latency and disk space utilization into consideration. This strategy for balancing makes sure that a low latency disk gets higher number of blocks in comparison to high latency disk. Furthermore, we introduce a custom block placement strategy considering disk latency and other factors. Our preliminary results show an improvement of upto 20% in job runtime.
Databáze: OpenAIRE