Mean PB To Failure - Initial results from a long-term study of disk storage patterns at the RACF

Autor:	W Strecker-Kellogg, C Hollowell, T Rao, C Caramarcu, A Wong, S A Zaytsev
Rok vydání:	2015
Předmět:	History Engineering business.industry Reliability (computer networking) Node (networking) computer.software_genre Computer Science Applications Education Long term learning Embedded system Computer data storage Data_FILES Operating system Key (cryptography) Disk storage business computer
Zdroj:	Journal of Physics: Conference Series. 664:042057
ISSN:	1742-6596 1742-6588
DOI:	10.1088/1742-6596/664/4/042057
Popis:	The RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has nearly 50,000 computing cores and over 23 PB of storage capacity distributed over 12,000+ (non-SSD) disk drives. The majority of the 12,000+ disk drives provide a cost-effective solution for dCache/XRootD-managed storage, and a key concern is the reliability of this solution over the lifetime of the hardware, particularly as the number of disk drives and the storage capacity of individual drives grow. We report initial results of a long-term study to measure lifetime PB read/written to disk drives in the worker node cluster. We discuss the historical disk drive mortality rate, disk drive manufacturers' published MPTF (Mean PB to Failure) data and how they are correlated to our results. The results help the RACF understand the productivity and reliability of its storage solutions and have implications for other highly-available storage systems (NFS, GPFS, CVMFS, etc) with large I/O requirements.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c58abd6da83701f10e92cfe2e56e7bf8 https://doi.org/10.1088/1742-6596/664/4/042057 Zobrazit plný text záznamu