Методи оброблення та заповнення пропущених параметрів у даних екологічного моніторингу
Autor: | Roman Tkachenko, O. S. Mishchuk |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
пропуски в даних оброблення пропущених елементів методи заповнення пропусків регресійне моделювання 020206 networking & telecommunications Environmental pollution Regression analysis 02 engineering and technology computer.software_genre Missing data 01 natural sciences Random forest Support vector machine Stochastic gradient descent Complete information Multilayer perceptron 0103 physical sciences 0202 electrical engineering electronic engineering information engineering lcsh:SD1-669.5 General Earth and Planetary Sciences Data mining lcsh:Forestry computer General Environmental Science |
Zdroj: | Науковий вісник НЛТУ України, Vol 29, Iss 6, Pp 119-122 (2019) |
ISSN: | 2519-2477 1994-7836 |
Popis: | The variety of sociological, economic, statistical, information and other studies face the problem of processing missing data. Traditional reasons that lead to the emergence of gaps in the data is the inability to obtain information, its distortion or even hiding. In the monitoring environmental pollution data it can be as follows: breakdown of devices; adverse weather conditions; errors of measuring devices; damage to information carriers; suspension of measurements during weekends; implementation of the minimum number of measurements allowed by the state standards. As a result, incomplete information is provided for the analysis of the collected data. Today, there are a large number of methods for recovering missing parameters in the data, but for each application area, different methods are used to fill the missing data. The paper analyzes the following methods for processing missing data: the removal of elements with gaps, the method of weighing and filling missed parameters. The mechanisms of missed parameters appearance are described, in which the probability of gaps for each set of records is the same, in which the probability of gaps is determined on the basis of other available full information and where data is not available depending on unknown factors. There is a need to analyze existing and study new methods for filling missed values in the data sets of environmental monitoring, to find such an algorithm that will maximally satisfy the needs for increasing the speed, efficiency and accuracy of filling out missed parameters. So, authors analyze methods for filling missing parameters in environmental monitoring data such as medium-mean, naive forecast, and regression modeling methods. The article describes the following methods for filling missing data on the basis of regression modeling: multilayer perceptron; Adaptive Boosting; Support vector machine; Random Forest and a linear regression method using stochastic gradient descent. A comparison of the simplest methods of filling missing data and the methods, based on regression models is performed. It has been experimentally proved that the pre-developed method for filling gaps on the basis of the neural-like structure of the model of successive geometric transformations is the most effective method, since it shows the most precise results. |
Databáze: | OpenAIRE |
Externí odkaz: |