Scalable in situ scientific data encoding for analytical query processing
Autor: | Nagiza F. Samatova, Michael E. Papka, John Jenkins, Saurabh V. Pendse, Venkatram Vishwanath, Sriram Lakshminarasimhan, David A. Boyuka, Xiaocheng Zou |
---|---|
Rok vydání: | 2013 |
Předmět: |
020203 distributed computing
Speedup Computer science Existential quantification Distributed computing Search engine indexing 02 engineering and technology Exascale computing Visualization Rendering (computer graphics) 020204 information systems Scalability 0202 electrical engineering electronic engineering information engineering Raw data |
Zdroj: | HPDC |
DOI: | 10.1145/2462902.2465527 |
Popis: | The process of scientific data analysis in high-performance computing environments has been evolving along with the advancement of computing capabilities. With the onset of exascale computing, the increasing gap between compute performance and I/O bandwidth has rendered the traditional method of post-simulation processing a tedious process. Despite the challenges due to increased data production, there exists an opportunity to benefit from "cheap" computing power to perform query-driven exploration and visualization during simulation time. To accelerate such analyses, applications traditionally augment raw data with large indexes, post-simulation, which are then repeatedly utilized for data exploration. However, the generation of current state-of-the-art indexes involve a compute- and memory-intensive processing, thus rendering them inapplicable in an in situ context. In this paper we propose DIRAQ, a parallel in situ, in network data encoding and reorganization technique that enables the transformation of simulation output into a query-efficient form, with negligible runtime overhead to the simulation run. DIRAQ begins with an effective core-local, precision-based encoding approach, which incorporates an embedded compressed index that is 3 -- 6x smaller than current state-of-the-art indexing schemes. DIRAQ then applies an in network index merging strategy, enabling the creation of aggregated indexes ideally suited for spatial-context querying that speed up query responses by up to 10x versus alternative techniques. We also employ a novel aggregation strategy that is topology-, data-, and memory-aware, resulting in efficient I/O and yielding overall end-to-end encoding and I/O time that is less than that required to write the raw data with MPI collective I/O. |
Databáze: | OpenAIRE |
Externí odkaz: |