Popis: |
Continuous live stream analysis applications are increasingly common. Video-based surveillance, emergency response, disaster recovery, and critical infrastructure monitoring are all examples of such applications. These applications are distributed and typically require significant computing resources (like a cluster of workstations) for analysis. In addition to live data, many such applications also require access to historical data that was streamed in the past and is now archived. While distributed programming support for traditional high-performance computing applications is fairly mature, existing solutions for live stream analysis applications are still in their early stages and, in our view, inadequate. We explore the system-level value of recognizing temporal properties -- a critical aspect of the application domain. We present "temporal streams", a programming model supporting a higher-level, domain-targeted programming abstraction for such applications. It provides a simple but expressive stream abstraction encompassing transport, manipulation and storage of streaming data. The semantics of the programming model are tailored to the application domain by explicitly recognizing the temporal aspects of continuous streams, providing a common interface for both time-based retrieval of current streaming data and data persistence. The unifying trait of time enables access to both current streaming data and archived historical data using the same interface; the communication and storage abstraction are the same -- a unified stream data abstraction, uniformly modeling stream data interactions. "Temporal streams" defines how distributed threads of computation interact implicitly via streams, but does not impose a particular model of computation constraining the interactions between distributed actors, targeting loosely coupled distributed systems with no centralized control. In particular, it targets stream analysis scenarios requiring significant signal processing on heavyweight streams such as audio and video. These unstructured streams are data rich but are not directly interpretable until meaningful features are extracted; consequently, feature detection and subsequent analysis are the major computational requirements. We also use the programming model as a vehicle for exploring systems software design issues, realizing "temporal streams" as a distributed runtime in the tradition of loosely coupled distributed systems with strong communication boundaries. We thoroughly examine the concrete software architecture and elements of implementation. We also describe two generations of system implementations, including the broad development philosophy, specific design principles and salient low-level details. The runtime is designed to be relatively lightweight and suitable as a substrate for higher-level, more domain-specific middleware or application functionality. Even with a relatively simple programming model, a carefully designed system architecture can provide a surprisingly rich and flexibly substrate for upper software layers. We also evaluate our system implementation in two ways; first, we present a series of quantitative experimental results designed to assess the performance of key primitives in our architecture in isolation. We also use motivating applications to evaluate "temporal streams" in the context of realistic application scenarios. We develop three motivating applications and provide quantitative and qualitative analyses of these applications in the context of "temporal streams." We show that, although it provides needed higher-level functionality to enable live stream analysis applications, our runtime does not add significant overhead to the stream computation at the core of each application. Finally, we also review the relationship of "temporal streams" (both the programming model and architecture) to other approaches, including database-oriented Stream Data Management Systems (SDMS), various stream processing engines, stream programming languages and parallel batch processing systems, as well as traditional distributed programming systems and communication frameworks. |