Popis: |
NAND flash memory solid state devices are widely used in many platforms including consumer electronics and safety-critical embedded systems because they offer high performance and reliability. In previous work, we have developed a novel RAID architecture for NAND flash that protects a system from data loss in the case of failure, or wear-out, of individual flash chips. These mechanisms permit the recovery of data onto a new replacement chip when a particular element in the array reaches its endurance limit -- however the use of this architecture in a hard real-time system is limited as the memory needs to be taken off-line while the replacement is actioned and so memory access times become non-deterministic with respect to time. Moreover, existing hard disk based online reconstruction mechanisms do not work well with solid state RAID as they are unable to exploit flash memory internal operations such as garbage collection and metadata management. In this paper we present techniques for replacing elements in the array that are approaching their wear-out level that does not involve taking the array off-line, thereby increasing system dependability by providing continuous system availability with higher I/O performance for hard real-time embedded applications. We have implemented these techniques in our FPGA-based SSD RAID controller. Our simulation results indicate that the run-time device replacement techniques improved the average I/O response time by 39% during replacement. Moreover, these techniques improve the write capability, and reduce the time needed to execute a replacement. |