How fast can one resize a distributed file system?

Autor:	Gabriel Antoniu, Matthieu Dorier, Nathanaël Cheriere
Přispěvatelé:	Scalable Storage for Clouds and Beyond (KerData), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SYSTÈMES LARGE ÉCHELLE (IRISA-D1), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-CentraleSupélec-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Argonne National Laboratory [Lemont] (ANL), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique)
Rok vydání:	2020
Předmět:	Elastic storage Computer Networks and Communications Computer science Distributed computing 02 engineering and technology Bottleneck Theoretical Computer Science Resource (project management) Artificial Intelligence Factor (programming language) Distributed data store 0202 electrical engineering electronic engineering information engineering Malleable File System Distributed File System Duration (project management) computer.programming_language 020206 networking & telecommunications Commission Malleable distributed file system Hardware and Architecture 020201 artificial intelligence & image processing Decommission [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] computer Software Model
Zdroj:	Journal of Parallel and Distributed Computing Journal of Parallel and Distributed Computing, Elsevier, 2020, 140, pp.80-98. ⟨10.1016/j.jpdc.2020.02.001⟩ Journal of Parallel and Distributed Computing, 2020, 140, pp.80-98. ⟨10.1016/j.jpdc.2020.02.001⟩
ISSN:	0743-7315 1096-0848
Popis:	International audience; Efficient resource utilization becomes a major concern as large-scale distributed computing infrastructures keep growing in size. Malleability, the possibility for resource managers to dynamically increase or decrease the amount of resources allocated to a job, is a promising way to save energy and costs.However, state-of-the-art parallel and distributed storage systems have not been designed with malleability in mind. The reason is mainly the supposedly high cost of data transfers required by resizing operations. Nevertheless, as network and storage technologies evolve, old assumptions about potential bottlenecks can be revisited.In this study, we evaluate the viability of malleability as a design principle for a distributed storage system. We specifically model the minimal duration of the commission and decommission operations. To show how our models can be used in practice, we evaluate the performance of these operations in HDFS, a relevant state-of-the-art distributed file system. We show that the existing decommission mechanism of HDFS is good when the network is the bottleneck, but can be accelerated by up to a factor 3 when storage is the limiting factor. We also show that the commission in HDFS can be substantially accelerated. With the highlights provided by our model, we suggest improvements to speed both operations in HDFS. We discuss how the proposed models can be generalized for distributed file systems with different assumptions and what perspectives are open for the design of efficient malleable distributed file systems.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a2015294e4d2b100566cb1d9da46ab9e https://doi.org/10.1016/j.jpdc.2020.02.001 Zobrazit plný text záznamu Full Text from ScienceDirect