A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations
Autor: | Kshitij Sharma, Steve Harenberg, Nagiza F. Samatova, Stephen Ranshous |
---|---|
Rok vydání: | 2016 |
Předmět: |
Theoretical computer science
Computer science Probabilistic logic Context (language use) 02 engineering and technology computer.software_genre Sketch Set (abstract data type) 020204 information systems Outlier Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Anomaly detection Data mining Enhanced Data Rates for GSM Evolution computer |
Zdroj: | SDM |
DOI: | 10.1137/1.9781611974348.22 |
Popis: | Dynamic graphs are a powerful way to model an evolving set of objects and their ongoing interactions. A broad spectrum of systems, such as information, communication, and social, are naturally represented by dynamic graphs. Outlier (or anomaly) detection in dynamic graphs can provide unique insights into the relationships of objects and identify novel or emerging relationships. To date, outlier detection in dynamic graphs has been studied in the context of graph streams, focusing on the analysis and comparison of entire graph objects. However, the volume and velocity of data are necessitating a transition from outlier detection in the context of graph streams to outlier detection in the context of edge streams–where the stream consists of individual graph edges instead of entire graph objects. In this paper, we propose the first approach for outlier detection in edge streams. We first describe a highlevel model for outlier detection based on global and local structural properties of a stream. We propose a novel application of the Count-Min sketch for approximating these properties, and prove probabilistic error bounds on our edge outlier scoring functions. Our sketch-based implementation provides a scalable solution, having constant time updates and constant space requirements. Experiments on synthetic and real world datasets demonstrate our method’s scalability, effectiveness for discovering outliers, and the effects of approximation. |
Databáze: | OpenAIRE |
Externí odkaz: |