RAIDP
Autor: | Aviad Zuck, Michael Factor, Eitan Rosenfeld, Dan Tsafrir, Nadav Amit |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
Distributed computing 020206 networking & telecommunications 02 engineering and technology Data loss Durability Replication (computing) 020204 information systems Distributed data store Data_FILES 0202 electrical engineering electronic engineering information engineering Point (geometry) Erasure code Design space |
Zdroj: | EuroSys |
DOI: | 10.1145/3342195.3387546 |
Popis: | Distributed storage systems often triplicate data to reduce the risk of permanent data loss, thereby tolerating at least two simultaneous disk failures at the price of 2/3 of the capacity. To reduce this price, some systems utilize erasure coding. But this optimization is usually only applied to cold data, because erasure coding might hinder performance for warm data. We propose RAIDP---a new point in the distributed storage design space between replication and erasure coding. RAIDP maintains only two replicas, rather than three or more. It increases durability by utilizing small disk "add-ons" for storing intra-disk erasure codes that are local to the server but fail independently from the disk. By carefully laying out the data, the add-ons allow RAIDP to recover from simultaneous disk failures (add-ons can be stacked to withstand an arbitrary number of failures). RAIDP retains much of the benefits of replication, trading off some performance and availability for substantially reduced storage requirements, networking overheads, and their related costs. We implement RAIDP in HDFS, which triplicates by default. We show that baseline RAIDP achieves performance close to that of HDFS with only two replicas, and performs within 21% of the default triplicating HDFS with an update-oriented variant, while halving the storage and networking overheads and providing similar durability. |
Databáze: | OpenAIRE |
Externí odkaz: |