Difference between Similars: A Novel Method to Use Topic Models for Sensor Data Analysis
Autor: | Takumi Eguchi, Tomonari Masada, Daisuke Hamaguchi |
---|---|
Rok vydání: | 2019 |
Předmět: |
Topic model
business.industry Computer science 02 engineering and technology computer.software_genre Latent Dirichlet allocation Set (abstract data type) symbols.namesake Exploratory data analysis Text mining 020204 information systems Histogram 0202 electrical engineering electronic engineering information engineering symbols Factory (object-oriented programming) 020201 artificial intelligence & image processing Data mining business Representation (mathematics) computer Natural language |
Zdroj: | ICDM Workshops |
Popis: | We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. This paper describes a case study where we perform an exploratory data analysis of manufacturing sensor data by using latent Dirichlet allocation (LDA) as a tool to discover remarkable change patterns. Our target is a set of time-series data originating from the sensors installed in a closed factory environment. Each sensor gives a different type of measurement of the same manufacturing process, which is operated repeatedly in a lot-by-lot manner. We first discretize the data based on the histogram of sensor measurements and construct a bag-of-words representation. We then apply LDA to discover change patterns across tens of thousands of lots. When we apply LDA to natural language documents, the resulting topics are widely different from each other because the documents intrinsically show considerable diversity. In contrast, our data, which come from the repeatedly operated manufacturing process, only show limited diversity. As a result, LDA provides topics closely similar to each other. Our main and unexpected finding is that the difference between similar topics is useful in discovering remarkable change patterns. We performed an experiment over the data sets containing sensor measurements collected in the factory. The results have revealed that subtle difference between very similar topics often corresponds to an interesting change pattern of sensor measurements. |
Databáze: | OpenAIRE |
Externí odkaz: |