Scaling Ordered Stream Processing on Shared-Memory Multicores

Autor:	Ganesan Ramalingam, Kaushik Rajan, Guna Prasaad
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	FOS: Computer and information sciences 050101 languages & linguistics Multi-core processor Concurrent data structure Data parallelism Computer science Dataflow 05 social sciences Task parallelism Databases (cs.DB) 02 engineering and technology Dynamic priority scheduling Parallel computing Stream processing Shared memory Computer Science - Databases 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 0501 psychology and cognitive sciences
Zdroj:	BIRTE
Popis:	Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple opportunities for parallelizing its execution, in the form of data, pipeline and task parallelism. On the other hand, many important applications require that processing of the stream be ordered, where inputs are processed in the same order as they arrive. There is a fundamental conflict between ordered processing and parallelizing the streaming computation. This paper focuses on the problem of effectively parallelizing ordered streaming computations on a shared-memory multicore machine. We first address the key challenges in exploiting data parallelism in the ordered setting. We present a low-latency, non-blocking concurrent data structure to order outputs produced by concurrent workers on an operator. We also propose a new approach to parallelizing partitioned stateful operators that can handle load imbalance across partitions effectively and mostly avoid delays due to ordering. We illustrate the trade-offs and effectiveness of our concurrent data-structures on micro-benchmarks and streaming queries from the TPCx-BB benchmark. We then present an adaptive runtime that dynamically maps the exposed parallelism in the computation to that of the machine. We propose several intuitive scheduling heuristics and compare them empirically on the TPCx-BB queries. We find that for streaming computations, heuristics that exploit as much pipeline parallelism as possible perform better than those that seek to exploit data parallelism.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4183be8d92265ee5b4fd5bc090863c32 http://arxiv.org/abs/1803.11328 Zobrazit plný text záznamu