Increment of Academic Performance Prediction of At-Risk Student by Dealing With Data Imbalance Problem

Autor: Nguyen Giap Cu, Thi Lich Nghiem, Thi Hoai Ngo, Manh Tuong Lam Nguyen, Hong Quan Phung
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Applied Computational Intelligence and Soft Computing, Vol 2024 (2024)
Druh dokumentu: article
ISSN: 1687-9732
DOI: 10.1155/2024/4795606
Popis: Studies on automatically predicting student learning outcomes often focus on developing and optimizing machine learning algorithms that fit the data captured from different education systems. This approach has a fatal weakness when it is used for disadvantaged groups, such as those with academic warnings or who have dropped out, because these groups are often much smaller than other common groups in number. The imbalanced data that have class distribution skew create a big challenge to training good classification models. The significant approach to tackle this challenge is applying oversampling methods to increase the number of minor classes; however, generating good new samples from the existing instances of a minor class is still a hard issue and requires new investigation. This study presents two new methods of handling data imbalance based on the original algorithms SMOTE and adaptive synthetic sampling (ADASYN), called Improved SMOTE (I_SMOTE) and Improved ADASYN (I_ADASYN). These modifications involve a new selecting fit candidate method based on a new similarity measurement and a roulette wheel selection to generate synthetic data samples. The aim is to rebalance data and therefore improve the prediction accuracy of minor groups. The proposal methods were designed and applied to education datasets, and they were tested on public datasets and a dataset collected from a Vietnamese university for evaluation. The experimental results on learning datasets showed the high potential of novel algorithms, I_SMOTE and I_ADASYN, for student academic performance problems in general and at-risk student groups especially. Empirical results proved that the recall, precision, and F1-score of the minority class of I_SMOTE and I_ADASYN are strongly better than the original balancing algorithms. Besides, the I_SMOTE and I_ADASYN also improve relatively by 6.6% and 8.0% of the ROC area compared to the original SMOTE and ADASYN, respectively.
Databáze: Directory of Open Access Journals