A local multiscale probabilistic graphical model for data validation and reconstruction, and its application in industry

Autor: Uriel A. García, Luis Enrique Sucar, Javier Herrera-Vega, Dan-El N. Vila Rosado, Eduardo F. Morales, Felipe Orihuela-Espina, Pablo H. Ibargüengoytia
Rok vydání: 2018
Předmět:
Technology
Discretization
PARTIAL LEAST-SQUARES
Computer science
Engineering
Multidisciplinary

Data validation
02 engineering and technology
computer.software_genre
Computer Science
Artificial Intelligence

09 Engineering
Synthetic data
Automation & Control Systems
Engineering
Artificial Intelligence
020204 information systems
Outlier detection
0202 electrical engineering
electronic engineering
information engineering

Artificial Intelligence & Image Processing
Graphical model
Electrical and Electronic Engineering
Probabilistic graphical models
Ground truth
Science & Technology
Probabilistic logic
Bayesian network
Engineering
Electrical & Electronic

Bayesian networks
Control and Systems Engineering
Computer Science
Outlier
020201 artificial intelligence & image processing
08 Information and Computing Sciences
Data mining
computer
Multiscale approach
Zdroj: Engineering Applications of Artificial Intelligence. 70:1-15
ISSN: 0952-1976
Popis: The detection and subsequent reconstruction of incongruent data in time series by means of observation of statistically related information is a recurrent issue in data validation. Unlike outliers, incongruent observations are not necessarily confined to the extremes of the data distribution. Instead, these rogue observations are unlikely values in the light of statistically related information. This paper proposes a multiresolution Bayesian network model for the detection of rogue values and posterior reconstruction of the erroneous sample for non-stationary time-series. Our method builds local Bayesian Network models that best fit to segments of data in order to achieve a finer discretization and hence improve data reconstruction. Our local multiscale approach is compared against its single-scale global predecessor (assumed as our gold standard) in the predictive power and of this, both error detection capabilities and error reconstruction capabilities are assessed. This parameterization and verification of the model are evaluated over three synthetic data source topologies. The virtues of the algorithm are then further tested in real data from the steel industry where the aforementioned problem characteristics are met but for which the ground truth is unknown. The proposed local multiscale approach was found to dealt better with increasing complexities in data topologies.
Databáze: OpenAIRE