ProSPECT: Proactive Storage Using Provenance for Efficient Compute and Tiering

Autor: Suparna Bhattacharya, Ancy Sarah Tom, Muthukumar Murugan, Doug Voigt, Madhumita Bharde
Rok vydání: 2021
Předmět:
Zdroj: Transactions of the Indian National Academy of Engineering. 7:219-234
ISSN: 2662-5423
2662-5415
Popis: AI and analytics applications are good at deriving meaningful insights from data, but they do not always cope well with the storage management challenges that come with a high pace of data generation. At the same time, a conventional data storage and management layer is not optimized to derive timely insights and value from huge volumes of data. This problem is rooted in a classical cross-layer dilemma wherein neither the application nor the storage layer has the deep knowledge needed to optimize the whole system. We resolve this omniscience dilemma by introducing ProSPECT, a set of techniques to proactively optimize analytics computations and data storage. ProSPECT enables a data fabric to become aware of the purpose and relevance of stored data by intercepting the lineage of workflows under execution within existing analytics frameworks. Partial analytics computations can then be initiated proactively by the data fabric layer, where data is stored and managed. ProSPECT provides analytics applications with relevant data or precomputed insights and alleviates storage management challenges using proactive tiering and data approximation. We describe experiments with application case studies using Apache Spark and Alluxio to demonstrate an order of magnitude reduction in the storage space occupied in the fastest tier and in time to value for analytics applications.
Databáze: OpenAIRE