Investigation of the influence of outliers on text documents probabilistic classifier quality

Autor: Andrey I. Kapitanov, Elena L. Fedotova, Vladimir M. Troyanovskiy, Valentin V. Slyusar, Ilona I. Kapitanova
Rok vydání: 2017
Předmět:
Zdroj: 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus).
Popis: In this paper we investigate the influence of outliers in the training set on the probabilistic classifier quality. By the example of naive Bayes classifier we show how the qualitative characteristics depend on the percentage of outliers' ratio. This dependence is built on three basic metrics of the classifier quality: precision, recall and F1 score. At the end we propose method for reducing the outliers influence on the classifier quality by approximating a piecewise linear function, and further using of gradient methods.
Databáze: OpenAIRE