A Feature Importance Analysis for Soft-Sensing-Based Predictions in a Chemical Sulphonation Process

Autor: Per Olav Hansen, Asmund Hugo, An Ngoc Lam, Espen Martinsen, Øystein Haugen, Brice Morin, Enrique Garcia-Ceja
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Signal Processing (eess.SP)
FOS: Computer and information sciences
Computer Science - Machine Learning
0209 industrial biotechnology
chemical
Computer science
Decision tree
02 engineering and technology
computer.software_genre
Machine Learning (cs.LG)
020901 industrial engineering & automation
feature selection
Linear regression
0202 electrical engineering
electronic engineering
information engineering

Feature (machine learning)
Range (statistics)
FOS: Electrical engineering
electronic engineering
information engineering

Electrical Engineering and Systems Science - Signal Processing
sulphonation
Computer Sciences
Regression analysis
prediction
Random forest
Variable (computer science)
machine learning
Datavetenskap (datalogi)
Metric (mathematics)
020201 artificial intelligence & image processing
Data mining
computer
Zdroj: ICPS
Popis: In this paper we present the results of a feature importance analysis of a chemical sulphonation process. The task consists of predicting the neutralization number (NT), which is a metric that characterizes the product quality of active detergents. The prediction is based on a dataset of environmental measurements, sampled from an industrial chemical process. We used a soft-sensing approach, that is, predicting a variable of interest based on other process variables, instead of directly sensing the variable of interest. Reasons for doing so range from expensive sensory hardware to harsh environments, e.g., inside a chemical reactor. The aim of this study was to explore and detect which variables are the most relevant for predicting product quality, and to what degree of precision. We trained regression models based on linear regression, regression tree and random forest. A random forest model was used to rank the predictor variables by importance. Then, we trained the models in a forward-selection style by adding one feature at a time, starting with the most important one. Our results show that it is sufficient to use the top 3 important variables, out of the 8 variables, to achieve satisfactory prediction results. On the other hand, Random Forest obtained the best result when trained with all variables.
Comment: Accepted for: 3rd IEEE International Conference on Industrial Cyber-Physical Systems (ICPS 2020)
Databáze: OpenAIRE