DisPatch: Distributed Pattern Matching over Streaming Time Series
Autor: | Abdullah Mueen, Hossein Hamooni |
---|---|
Rok vydání: | 2018 |
Předmět: |
Matching (statistics)
Data stream mining Computer science 020204 information systems Search engine indexing Real-time computing 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 02 engineering and technology Pattern matching Pruning (decision trees) |
Zdroj: | IEEE BigData |
Popis: | Matching a dictionary of patterns (i.e. subsequences) against a streaming time series to identify occurrences is one of the primary components of real-time monitoring systems such as earthquake monitoring, power consumption monitoring, and patient monitoring. These domains critically depend on timely alarms immediately after events (i.e. earthquake, fire, seizure, etc.) start. Until now, the problem has been solved independently by smart pruning, efficient approximation, and pattern indexing without bounding the delay between pattern occurrence and detection time. Moreover, complexity of the dictionary matching problem is quickly growing with larger dictionary sizes, faster data streams, and stricter delay requirements; pushing existing pattern matching systems to their limits. In this paper, we describe a robust distributed matching system, called DisPatch (Distributed Pattern Matching), that matches a pattern with a guaranteed maximum delay after the pattern appears in the stream. We develop and evaluate a novel distribution strategy and integrate state-of-the-art algorithmic optimization techniques to horizontally scale to a high data rate and a large dictionary size. We show three use cases of DisPatch in seismic, patient and power consumption monitoring. |
Databáze: | OpenAIRE |
Externí odkaz: |