Power-law mixtures of bayesian forests for value added tax audit case selection

Autor:	Sotirios P. Chatzis, Christos Kleanthous, Theodoros Christophides
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Inference engines Process (engineering) Computer science Decision trees Mechanical Engineering Bayesian probability Audit selection Audit Non-parametric bayesian mixture model Space (commercial competition) Random forests Bayesian inference Random forest Value-added tax Bayesian networks Mixtures Outlier Econometrics Engineering and Technology
Zdroj:	ICAIF
Popis:	Tax authorities need to maximize the yield of the limited tax audits they afford to perform each year. Thus, they need to predict the likelihood of a candidate audit resulting in a satisfactory yield; this predictive process is usually referred to as audit case selection. Random Forests (RFs) constitute a standard method for Value Added Tax (VAT) audit case selection. Despite, though, their success, their predictive performance is still below the expectations of tax authorities, that need to timely detect cases of significant audit yield potential. This lackluster performance is mainly attributed to the fact that RFs cannot deal with data that entail non-stationary nature, multiple modalities, or discontinuities. These are common characteristics of real-world datasets; thus, the incapacity to properly address them is a major suspect for undermining their performance. This work addresses these issues by considering a generative non-parametric Bayesian model with power-law behavior, capable of generating distinct (Bayesian) RFs over the observations space of the modeled data. This way, our approach enables capturing an indefinite number of distinct classification patterns, while being able to effectively handle outliers. The latter advantage is of paramount importance for the effectiveness of the modeling procedure in cases where few large parts of the observations space can be modeled by few RF classifiers, yet there is a large number of small parts of the observations space that require distinct RFs to be properly modeled (power-law nature). We provide an efficient algorithm for model inference, based on the variational Bayesian framework, and prove its efficacy using real-world datasets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4ce3c33143bc9c16f30cf5aac2b1ca64 https://hdl.handle.net/20.500.14279/29880 Zobrazit plný text záznamu