SpeedStream: A real-time stream data processing platform in the cloud

Autor: Li Zhao, Zhang Chuang, Xu Kefu
Rok vydání: 2015
Předmět:
Zdroj: IPCCC
DOI: 10.1109/pccc.2015.7410267
Popis: SpeedStream is a universal distributed platform that can handle with massive data flows with the features of low coupling, high availability, low latency and high scalability. Focusing on the core technologies of real-time stream computing platform in cloud environment, this paper conducts a series of researches and implementation of the system. First of all, aiming at the availability of real-time streaming computing platform, we design a high availability framework based on Zookeeper. It ensures fault detection and recovery of process level and node level timely by monitoring heartbreak of each modules and strategy of fault migration. Secondly, in order to increase the application types of the platform, by means of directed cycle detection and iteration protection, we design a real-time streaming computing model that based on directed graph with sources and sinks, which can not only satisfy the needs of common DAG computing services, but also support iteration computing services including directed cycle, bidirectional arcs and annular arcs. In addition, the platform can realize personalized task scheduling strategy for users by establishing task allocation matrix and optimize task allocation model. Finally, in order to solve the many-to-many dynamic load-balancing between tasks, we apply scheduler with status and distributed session table. It overcomes the difficulty of maintaining consistency of session without global session table. We also testified the convergence of this method. The experiment indicates that the throughput and data processing delay of SpeedStream are superior to other alternatives in dealing with the businesses of iteration applications, high traffic fluctuation applications, and high demand of load-balancing applications. This platform provides reliable, universal, and real-time solutions to process massive data flows, such as to process the real-time trading data in e-commerce, to analyze sensing flow in internet of things, and monitor traffics of the Internet.
Databáze: OpenAIRE