Popis: |
Data mining is one of the most valuable tools for several applications, especially in banking sector. It can be used in risk analysis of loans, detecting fraud in real time and classifying customers’ interests for retention and services purposes. The main objective of this research is to develop a predictive model that can expect the closure status of returned cases in Qatar. This would be of valuable help for the banking sector as well as The Ministry of Justice. The banking sector aims at anticipating when the check will be paid and the Ministry of Justice holds the burden to assign more judges to look into cases of this kind. To produce the predictive model, four-year worth of data (2014-2017) on returned checks as registered at the Police Criminal System in The State of Qatar were obtained. The data was first cleaned from improper input and the attributes collected were reduced from 16 to 10 eliminating those with the lowest to no correlation. Seven classifiers were used in Python code to generate the prediction model: Random Forest (RF), Gaussian Naive Bayes (NB), kNearest Neighbor (KNN), Logistic Regression (LR), Decision Tree Classifier (CART), Linear Discriminant (LD) and Support Vector Machine (SVM). The evaluation of the classifiers was based on three main criteria: accuracy, running time and memory usage. The results show that SVM was not able to converge to any output while RF was the most accurate with 84.4%. Moderate performance in all three criteria was obtained by KNN and LR classifiers, while CART had the second best accuracy with moderate running time and memory usage. It was also found that the size of dataset and the correlation between attributes hugely affect the performance of each algorithm. Moreover, a logarithmic correlation between the accuracy and running time of the classifiers was obtained with R2 of 0.94. To this end, several recommendations for future work have been proposed to facilitate the use of such predictive models with high confidence in Qatar |