Automation and prioritization of replica balancing in HDFS

Autor: Rhauani Weber Aita Fazul, Patricia Pitthan Barcelos
Přispěvatelé: Laboratório de Sistemas de Computação (LSC), Universidade Federal de Santa Maria = Federal University of Santa Maria [Santa Maria, RS, Brazil] (UFSM), Universidade Federal de Santa Maria (UFSM)
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing
SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing, Mar 2021, Virtual Event (Republic of Korea), South Korea. pp.235-238, ⟨10.1145/3412841.3442075⟩
SAC
DOI: 10.1145/3412841.3442075⟩
Popis: International audience; The Hadoop Distributed File System (HDFS) is a reliable storage engine designed to run over commodity hardware. To provide reliability and read performance, HDFS has a storage model based on data replication and works best when the file blocks are evenly spread across the cluster. HDFS Balancer is an Apache Hadoop daemon created for replica balancing on the file system. However, the tool is not optimized to meet potential usage demands of reliability and availability during data redistribution, besides requiring to be manually configured and triggered. In this work, we present a solution for replica balancing that takes advantage of the combined use of a proactive and a reactive approach. The former is addressed through the active monitoring of the computational environment by an agent-server structure. The latter is based on the customization of the default operation policy of the HDFS Balancer. As shown by the evaluation results, the solution automates the use of the HDFS Balancer and allows it to execute according to the reliability of the racks and the availability of the data stored in the cluster.
Databáze: OpenAIRE