Improving Data Availability in HDFS through Replica Balancing

Autor: Rhauani Weber Aita Fazul, Paulo Vinicius Cardoso, Patricia Pitthan Barcelos
Přispěvatelé: Laboratório de Sistemas de Computação (LSC), Universidade Federal de Santa Maria = Federal University of Santa Maria [Santa Maria, RS, Brazil] (UFSM), Universidade Federal de Santa Maria (UFSM)
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: 2019 9th Latin-American Symposium on Dependable Computing (LADC)
2019 9th Latin-American Symposium on Dependable Computing (LADC), Nov 2019, Natal, Brazil. pp.1-6, ⟨10.1109/LADC48089.2019.8995674⟩
LADC
DOI: 10.1109/LADC48089.2019.8995674⟩
Popis: International audience; Over time, the data distribution across an HDFS cluster may become unbalanced. The HDFS Balancer is a tool provided by Apache Hadoop that redistributes blocks by moving them from nodes with higher utilization to nodes with lower utilization. However, during block rearrangement, the HDFS Balancer does not aim to increase the availability of the data. This work presents a strategy that gives priority to block movements which increase the overall availability of the data stored in the HDFS. Thereby, increasing the fault tolerance as placing blocks in a higher number of racks tends to reduce the chances of data loss. In order to evaluate the implementation, an experimental investigation has been conducted to measure the system performance after balancing the cluster with the proposed solution.
Databáze: OpenAIRE