Data Change Exploration Using Time Series Clustering
Autor: | Divesh Srivastava, Felix Naumann, Leon Bornemann, Dmitri V. Kalashnikov, Tobias Bleifuß |
---|---|
Rok vydání: | 2018 |
Předmět: |
Series (mathematics)
Computer science Research areas 02 engineering and technology computer.software_genre Behavioral traits Transformation (function) Similarity (network science) 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining Static data Cluster analysis Unsupervised clustering computer |
Zdroj: | Datenbank-Spektrum. 18:79-87 |
ISSN: | 1610-1995 1618-2162 |
DOI: | 10.1007/s13222-018-0285-x |
Popis: | Analysis of static data is one of the best studied research areas. However, data changes over time. These changes may reveal patterns or groups of similar values, properties, and entities. We study changes in large, publicly available data repositories by modelling them as time series and clustering these series by their similarity. In order to perform change exploration on real-world data we use the publicly available revision data of Wikipedia Infoboxes and weekly snapshots of IMDB. The changes to the data are captured as events, which we call change records. In order to extract temporal behavior we count changes in time periods and propose a general transformation framework that aggregates groups of changes to numerical time series of different resolutions. We use these time series to study different application scenarios of unsupervised clustering. Our explorative results show that changes made to collaboratively edited data sources can help find characteristic behavior, distinguish entities or properties and provide insight into the respective domains. |
Databáze: | OpenAIRE |
Externí odkaz: |