Difference between Similars: A Novel Method to Use Topic Models for Sensor Data Analysis

Autor: Takumi Eguchi, Tomonari Masada, Daisuke Hamaguchi
Rok vydání: 2019
Předmět:
Zdroj: ICDM Workshops
Popis: We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. This paper describes a case study where we perform an exploratory data analysis of manufacturing sensor data by using latent Dirichlet allocation (LDA) as a tool to discover remarkable change patterns. Our target is a set of time-series data originating from the sensors installed in a closed factory environment. Each sensor gives a different type of measurement of the same manufacturing process, which is operated repeatedly in a lot-by-lot manner. We first discretize the data based on the histogram of sensor measurements and construct a bag-of-words representation. We then apply LDA to discover change patterns across tens of thousands of lots. When we apply LDA to natural language documents, the resulting topics are widely different from each other because the documents intrinsically show considerable diversity. In contrast, our data, which come from the repeatedly operated manufacturing process, only show limited diversity. As a result, LDA provides topics closely similar to each other. Our main and unexpected finding is that the difference between similar topics is useful in discovering remarkable change patterns. We performed an experiment over the data sets containing sensor measurements collected in the factory. The results have revealed that subtle difference between very similar topics often corresponds to an interesting change pattern of sensor measurements.
Databáze: OpenAIRE