The Node Status as a Prioritization Strategy for Replica Balancing in a HDFS Cluster

Autor: Patricia Pitthan Barcelos, Rhauani Weber Aita Fazul
Přispěvatelé: Laboratório de Sistemas de Computação (LSC), Universidade Federal de Santa Maria (UFSM), Universidade Federal de Santa Maria = Federal University of Santa Maria [Santa Maria, RS, Brazil] (UFSM)
Rok vydání: 2020
Předmět:
Zdroj: Anais Estendidos do Simpósio Brasileiro de Engenharia de Sistemas Computacionais
Anais Estendidos do Simpósio Brasileiro de Engenharia de Sistemas Computacionais, Nov 2020, Florianópolis, Brazil. pp.103-108, ⟨10.5753/sbesc_estendido.2020.13097⟩
DOI: 10.5753/sbesc_estendido.2020.13097
Popis: International audience; Data replication is the main fault tolerance mechanism of HDFS, the Hadoop Distributed File System. Although replication is essential to ensure high availability and reliability, the replicas might not always be placed evenly among the nodes. The HDFS Balancer is an integrated solution of Apache Hadoop that performs replica balancing through the rearrangement of the data blocks stored in the file system. The Balancer, however, demands a high computational effort of the nodes during its operation. This work presents a customization for the HDFS Balancer that considers the status of the nodes as a strategy to minimize the overhead caused by the balancing operation in the cluster. To this end, metrics obtained at runtime are used as a way to prioritize the nodes during data redistribution, making it occurs primarily between nodes with low communication traffic. Also, the Balancer starts to operate aiming at a minimum balance level, reducing the number of data transfers required to even up the data stored in the cluster. The evaluation results showed that the proposed customization allows reducing the time and bandwidth needed to reach the system balance.
Databáze: OpenAIRE