Popis: |
With the rapid growth of data-intensive jobs and the use of different hardware in storage, disaggregated storage architecture systems are being used to improve the operational cost efficiency of data centers. The hardware heterogeneity and mixed configurations of disaggregated storage systems, along with the diversity of workloads, often make it difficult for administrators to operate them optimally. In this work, we investigate model-based reinforcement learning (RL) schemes to develop automated system operations and maintain the storage performance across various system settings and workloads in self-managed storage systems. Specifically, we propose a novel configurable model structure in which a system environment is abstracted with a two-level hierarchy of storage devices and a platform and thus the environment can be reconfigured according to a given system specification. Using that novel model structure, we implement a configurable model-based RL framework CoMoRL by which RL agents are trained through model variants that represent a variety of storage system specifications; thus, their learned management policy can be highly robust to the diverse operation conditions of real-world storage systems. We evaluate our CoMoRL framework using a storage cluster that relies on NVMe-oF devices and demonstrate that the framework can be adapted to different scenarios such as volume placement scenarios with Kubernetes and primary affinity control scenarios with Ceph. The learned management policy outperforms an IOPS-based heuristic method and a model-based method by 0.7%~5.1% and 11.8%~29.7%, respectively, for various Kubernetes system specifications, and by 1.6%~5.6% and 8.2%~16.5%, respectively, for various Ceph system specifications, without requiring model and policy retraining. This zero-shot adaptation superiority of our framework makes it possible to realize RL-based self-managing storage systems in data centers with frequent system changes. |