A Hybrid Technique for Health Insurance Fraud Detection on Highly Imbalanced Dataset

Autor: Ilango Velchamy, Shamitha S Kotekani
Rok vydání: 2019
Předmět:
Zdroj: International Journal of Innovative Technology and Exploring Engineering. 8:3498-3501
ISSN: 2278-3075
DOI: 10.35940/ijitee.k2489.0981119
Popis: Health Insurance industry is producing a massive amount of heterogeneous data. Detecting fraud from these data is a challenging task. Highly imbalanced data causes huge challenge to the Insurance Data Analysis. Classification of imbalanced data is a critical issue faced by the fraud detection methodologies. Fraud only covers less than 10% of the whole data. In this study, we use highly imbalanced data and propose a hybrid method for fixing class imbalance problem by using a combination of SMOTE, Cross Validation, and Random Forest. We used Medicare data, which will be applied to various sampling techniques, and further a classification model was built. We observed that SMOTE with Random forest with cross validation produced excellent results. Our model should be capable of identifying all the relevant(fraud) instances, i.e., the model should have a high recall value. SMOTE with Random forest had average recall of 86% and an overall accuracy of 90%, which could be considered as good among the existing models.
Databáze: OpenAIRE