Complete Fusion for Stateful Streams: Equational Theory of Stateful Streams and Fusion as Normalization-by-Evaluation

Autor: Kiselyov, Oleg, Kobayashi, Tomoaki, Palladinos, Nick
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Processing large amounts of data fast, in constant and small space is the point of stream processing and the reason for its increasing use. Alas, the most performant, imperative processing code tends to be almost impossible to read, let alone modify, reuse -- or write correctly. We present both a stream compilation theory and its implementation as a portable stream processing library Strymonas that lets us assemble complex stream pipelines just by plugging in simple combinators, and yet attain the performance of hand-written imperative loops and state machines. The library supports finite and infinite streams and offers a rich set of combinators: from map, filter, take(while) to flat-map (nesting), zip, map-accumulate and sliding windowing. The combinators may be freely composed, and yet the resulting convoluted imperative code contains no traces of combinator abstractions: no closures, intermediate objects or tuples. The high-performance is portable and statically guaranteed, without relying on compiler or black-box optimizations. We greatly exceed in performance the available stream processing libraries in OCaml. The library exists in two versions, OCaml and Scala 3, and supports pluggable backends for code generation (currently: C, OCaml and Scala). Strymonas has been developed in tandem with the equational theory of stateful streams. Our theoretical model can represent all desired pipelines and also guarantees the existence of unique normal forms, which are mappable to (fused) state machines. We describe the normalization algorithm, as a form of normalization-by-evaluation. Stream pipeline compilation and optimization are represented as normalization, and are hence deterministic and terminating, with the guaranteed outcome. The equational theory lets us state and prove the correctness of the complete fusion optimization.
Databáze: arXiv