Distributed Data Replication and Access Optimization for LHCb Storage System - A Position Paper

Autor: Mikhail Hushchyn, Philippe Charpentier, Andrey Ustyuzhanin
Rok vydání: 2015
Předmět:
Zdroj: KDIR
Popis: This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two different storage types are used to store data in the hybrid data storage systems. Keeping rarely used data on cheap and slow storages of type one and often used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will often used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal number of data replicas our system finds optimal data distribution in the hybrid data storage system. We present the results of simulation of our method for LHCb hybrid data storage.
Databáze: OpenAIRE