Interpolation of non-random missing values in financial statements’ big data using CatBoost
Autor: | Shouji Fujimoto, Takayuki Mizuno, Atushi Ishikawa |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | Journal of Computational Social Science. 5:1281-1301 |
ISSN: | 2432-2725 2432-2717 |
Popis: | Financial statements’ big data have the characteristics of “Incompleteness” and “Nonrepresentative”. In this paper, employing the world’s largest commercial database on finance, ORBIS, we first find that the rate of missing data varies depending on the country, the type and size of financial items, and the year. Using information on missing data, we interpolate non-random missing financial variables from the previous- and/or next-year values of the same financial item, the values of other financial items, and the conditions of missing values determined by CatBoost. Because the distribution of financial values obeys Zipf’s law in the large-scale range and mean and variance diverge, we employ an inverse hyperbolic function to convert the value of a financial item as a target variable. We introduce two types of missing interpolation models according to the two types of situations involving missing objective variables. After verifying the accuracies and stabilities of these models, we describe the properties of firm-scale variables in which non-random missing values are interpolated. In the final stage of this work, we combine these two models. From our observations, we confirm that the range in which Zipf’s law is established becomes wider than before interpolation. |
Databáze: | OpenAIRE |
Externí odkaz: |