Abstrakt: |
Particulate matter 2.5 µm (PM2.5) or less in diameter is one of the most important air pollutants owing to its harmful effects on health. However, the measured data of PM2.5 in air quality monitoring networks may have large missing values owing to equipment failure. We conducted a comparative study of imputation techniques for missing value estimation in PM2.5, which was regularly measured in the air quality monitoring network in Lima City, Peru. Lima is the second most polluted city in South America. In this regard, various imputation techniques were implemented, among them, moving averages-based approaches (e.g., Autoregressive Integrated Moving Average ARIMA, Exponentially Weighted Moving Average EWMA, Linear Weighted Moving Average LWMA, and Local Average of Nearest Neighbors LANN), interpolation-based models (e.g., spline), and deep learning-based methods (e.g., Long Short-Term Memory LSTM, Bidirectional LSTM, Gated Recurrent Unit GRU, and Bidirectional GRU) to estimate missing values in PM2.5 time series. For experimentation, a dataset of 11822 h was used, considering 80% for training and the remaining 20% for testing. The results in terms of RMSE, MAPE, and R2 demonstrated that for different configurations of short-gaps of missing values, the techniques based on moving averages yielded better results than those based on deep learning. Among the moving average-based techniques, ARIMA was the best model for estimating missing values in PM2.5 time series, and the MAPE values ranged from 0.0005% to 11.6522%. [ABSTRACT FROM AUTHOR] |