Improving Restore Performance of Packed Datasets in Deduplication Systems via Reducing Persistent Fragmented Chunks
Autor: | Min Fu, Chunzhi Wang, Qiang Wang, Hongmu Han, Xinhua Dong, Xinyun Wu, Fang Wang, Yucheng Zhang |
---|---|
Rok vydání: | 2020 |
Předmět: |
020203 distributed computing
Computer science business.industry Distributed computing Fragmentation (computing) Backup software Thrashing 02 engineering and technology computer.software_genre Computational Theory and Mathematics Hardware and Architecture Backup Signal Processing Computer data storage Data_FILES 0202 electrical engineering electronic engineering information engineering Redundancy (engineering) Data deduplication Cache business computer |
Zdroj: | IEEE Transactions on Parallel and Distributed Systems. 31:1651-1664 |
ISSN: | 2161-9883 1045-9219 |
DOI: | 10.1109/tpds.2020.2972898 |
Popis: | Data deduplication, though being efficient for redundancy elimination in storage systems, introduces chunk fragmentation which severely decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Typically, the backup software aggregates files into larger “tar” type files for storage. We observe that, in tar type datasets, a large number of Persistent Fragmented Chunks (PFCs) are repeatedly rewritten by state-of-the-art rewriting algorithms in every backup, which severely impacts restore performance. We found that the existence of PFCs is due to the traditional strategy of storing PFCs along with other chunks in the containers to preserve the stream locality, rendering them always stored in the containers with low utilization. We propose DePFC to reduce PFCs. DePFC identifies and removes PFCs from the containers preserving the stream locality, and groups them together, to increase the utilization of containers holding them for the subsequent backup, thus preventing them from being rewritten again. We further propose an FC Buffer to avoid mistaken rewrites of PFCs and grouping PFCs that cause restore cache thrashing together. Experimental results demonstrate that DePFC improves restore performance of state-of-the-art rewriting algorithms by 44.24-89.42 percent, while attaining comparable deduplication efficiency, and FC Buffer further improves restore performance. |
Databáze: | OpenAIRE |
Externí odkaz: |