HybridFS - a high performance and balanced file system framework with multiple distributed file systems
Autor: | Hongji Yang, Lidong Zhang, Yeh-Ching Chung, Tse-Chuan Hsu, Yongwei Wu, Ruini Xue |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
Java
Computer science Stub file 02 engineering and technology computer.software_genre Design rule for Camera File system Server Data file 0202 electrical engineering electronic engineering information engineering Data_FILES Versioning file system SSH File Transfer Protocol Distributed File System File system fragmentation computer.programming_language File system 020203 distributed computing Indexed file Computer file Device file computer.file_format Everything is a file Unix file types Virtual file system Torrent file File Control Block Self-certifying File System Journaling file system Operating system File area network 020201 artificial intelligence & image processing Fork (file system) computer |
Zdroj: | COMPSAC (1) |
Popis: | In the big data era, the distributed file system is getting more and more significant due to the characteristics of\ud its scale-out capability, high availability, and high performance. Different distributed file systems may have different design goals. For example, some of them are designed to have good performance for small file operations, such as GlusterFS, while some of them are designed for large file operations, such as Hadoop distributed file system. With the divergence of big data applications, a distributed file system may provide good performance for some applications but fails for some other applications, that is, there has no universal distributed file system that can produce good performance for all applications. In this\ud paper, we propose a hybrid file system framework, HybridFS, which can deliver satisfactory performance for all applications. HybridFS is composed of multiple distributed file systems with the integration of advantages of these distributed file systems. In HybridFS, on top of multiple distributed file systems, we have designed a metadata management server to perform three functions: file placement, partial metadata store, and dynamic file migration. The file placement is performed based on a decision tree. The partial metadata store is performed for files whose size is less than a few hundred Bytes to increase throughput. The dynamic file migration is performed to balance the storage usage of distributed file systems without throttling performance. We have implemented HybridFS in java on eight nodes and choose Ceph, HDFS, and GlusterFS as designated distributed file systems. The experimental results show that, in the best case, HybridFS can have up to 30% performance improvement of read/write operations over a single distributed file system. In addition, if the difference of storage usage among multiple distributed file systems is less than 40%, the performance of HybridFS is guaranteed, that is, no performance degradation. |
Databáze: | OpenAIRE |
Externí odkaz: |