A local multiscale probabilistic graphical model for data validation and reconstruction, and its application in industry
Autor: | Uriel A. García, Luis Enrique Sucar, Javier Herrera-Vega, Dan-El N. Vila Rosado, Eduardo F. Morales, Felipe Orihuela-Espina, Pablo H. Ibargüengoytia |
---|---|
Rok vydání: | 2018 |
Předmět: |
Technology
Discretization PARTIAL LEAST-SQUARES Computer science Engineering Multidisciplinary Data validation 02 engineering and technology computer.software_genre Computer Science Artificial Intelligence 09 Engineering Synthetic data Automation & Control Systems Engineering Artificial Intelligence 020204 information systems Outlier detection 0202 electrical engineering electronic engineering information engineering Artificial Intelligence & Image Processing Graphical model Electrical and Electronic Engineering Probabilistic graphical models Ground truth Science & Technology Probabilistic logic Bayesian network Engineering Electrical & Electronic Bayesian networks Control and Systems Engineering Computer Science Outlier 020201 artificial intelligence & image processing 08 Information and Computing Sciences Data mining computer Multiscale approach |
Zdroj: | Engineering Applications of Artificial Intelligence. 70:1-15 |
ISSN: | 0952-1976 |
Popis: | The detection and subsequent reconstruction of incongruent data in time series by means of observation of statistically related information is a recurrent issue in data validation. Unlike outliers, incongruent observations are not necessarily confined to the extremes of the data distribution. Instead, these rogue observations are unlikely values in the light of statistically related information. This paper proposes a multiresolution Bayesian network model for the detection of rogue values and posterior reconstruction of the erroneous sample for non-stationary time-series. Our method builds local Bayesian Network models that best fit to segments of data in order to achieve a finer discretization and hence improve data reconstruction. Our local multiscale approach is compared against its single-scale global predecessor (assumed as our gold standard) in the predictive power and of this, both error detection capabilities and error reconstruction capabilities are assessed. This parameterization and verification of the model are evaluated over three synthetic data source topologies. The virtues of the algorithm are then further tested in real data from the steel industry where the aforementioned problem characteristics are met but for which the ground truth is unknown. The proposed local multiscale approach was found to dealt better with increasing complexities in data topologies. |
Databáze: | OpenAIRE |
Externí odkaz: |