Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies
Autor: | H. Al-Jeelani, Antti Hyvärinen, Lubna Dada, Heikki Lihavainen, Tareq Hussein, Mansour A. Alghamdi, Martha A. Zaidan |
---|---|
Přispěvatelé: | Global Atmosphere-Earth surface feedbacks, INAR Physics, Air quality research group, Department of Physics |
Rok vydání: | 2019 |
Předmět: |
010504 meteorology & atmospheric sciences
Computer science air pollution Air pollution Data loss 010501 environmental sciences Overfitting medicine.disease_cause Machine learning computer.software_genre lcsh:Technology 01 natural sciences 114 Physical sciences probabilistic machine learning lcsh:Chemistry medicine General Materials Science mutual information Proxy (statistics) ozone proxy lcsh:QH301-705.5 Instrumentation 1172 Environmental sciences 0105 earth and related environmental sciences Fluid Flow and Transfer Processes Pollutant lcsh:T business.industry Process Chemistry and Technology 213 Electronic automation and communications engineering electronics General Engineering Probabilistic logic Mutual information Missing data lcsh:QC1-999 Computer Science Applications lcsh:Biology (General) lcsh:QD1-999 lcsh:TA1-2040 Artificial intelligence lcsh:Engineering (General). Civil engineering (General) business computer lcsh:Physics |
Zdroj: | Applied Sciences Applied Sciences, Vol 9, Iss 20, p 4475 (2019) Volume 9 Issue 20 |
ISSN: | 2076-3417 |
DOI: | 10.3390/app9204475 |
Popis: | An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration. An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration. |
Databáze: | OpenAIRE |
Externí odkaz: |