Automation and prioritization of replica balancing in HDFS
Autor: | Rhauani Weber Aita Fazul, Patricia Pitthan Barcelos |
---|---|
Přispěvatelé: | Laboratório de Sistemas de Computação (LSC), Universidade Federal de Santa Maria = Federal University of Santa Maria [Santa Maria, RS, Brazil] (UFSM), Universidade Federal de Santa Maria (UFSM) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
File system
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] business.industry Computer science Reliability (computer networking) Replica 020207 software engineering 02 engineering and technology computer.software_genre Automation Storage model Personalization [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] 020204 information systems 0202 electrical engineering electronic engineering information engineering Operating system replica balancing racks reliability [INFO]Computer Science [cs] Daemon [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] Distributed File System business computer data availability |
Zdroj: | SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing, Mar 2021, Virtual Event (Republic of Korea), South Korea. pp.235-238, ⟨10.1145/3412841.3442075⟩ SAC |
DOI: | 10.1145/3412841.3442075⟩ |
Popis: | International audience; The Hadoop Distributed File System (HDFS) is a reliable storage engine designed to run over commodity hardware. To provide reliability and read performance, HDFS has a storage model based on data replication and works best when the file blocks are evenly spread across the cluster. HDFS Balancer is an Apache Hadoop daemon created for replica balancing on the file system. However, the tool is not optimized to meet potential usage demands of reliability and availability during data redistribution, besides requiring to be manually configured and triggered. In this work, we present a solution for replica balancing that takes advantage of the combined use of a proactive and a reactive approach. The former is addressed through the active monitoring of the computational environment by an agent-server structure. The latter is based on the customization of the default operation policy of the HDFS Balancer. As shown by the evaluation results, the solution automates the use of the HDFS Balancer and allows it to execute according to the reliability of the racks and the availability of the data stored in the cluster. |
Databáze: | OpenAIRE |
Externí odkaz: |