Predictive system for characterizing low performance of Undergraduate students using machine learning techniques

Autor: Ekubo, E.A.
Přispěvatelé: Esiefarienrhe, B., Gasela, N., 25840525 - Esiefarienrhe, Bukohwo Michael (Supervisor), 24704113 - Gasela, Naison (Supervisor)
Jazyk: angličtina
Rok vydání: 2020
Popis: PhD (Computer Science and Information Systems), North-West University, Mafikeng Campus One challenge of educational institutions is the low academic performance of students. This challenge affects students, tutors, institutions and the society in varieties of ways. To deal with this problem, researchers have applied several methods and most recently, researchers have employed data mining methods. This thesis considered the factors that affect low academic performance in Nigeria, employs machine-learning techniques to design models to assist with classification of students' performance and develops a software that classifies students' into different performance groups without the use of data mining tools. The data used for this research was collected from undergraduate students' records from the Niger Delta University, Bayelsa State, Nigeria. The CRISP-DM research methodology was used for the data mining aspect while agile methodology was used for the software development. The modelling was carried out using WEKA tool. Five (5) machine-learning algorithms namely J48 decision tree, logistic regression, multilayer perceptron, naïve Bayes and sequential minimal optimization were used in the data mining to select the algorithm that produces the best model for the data. To analyse the model built by each machine-learning algorithm, six (6) metrics of evaluation namely values of recall or sensitivity, specificity, ROC area, F-Measure Kappa statistics and root mean squared error (RMSE) were used. At the end of the modelling process, the research found the multilayer perceptron as the best classifier for the dataset. This study also considers the use of four feature selection techniques, which are Correlation, Gain Ratio, Information Gain and ReliefF to select the most relevant features out of the 24 features gathered in the dataset. Results from the feature selection procedure selected sixteen (16) most relevant features. Having identified the best classifier for the dataset, the study went further to develop a novel predictive software using php and python programming languages for the implementation of the multilayer perceptron model with the best features identified from the modelling phase. The software is a contribution from this research to enable institutions quickly identify students' performance without prior knowledge of using machine-learning tools. To evaluate the performance of the software, the research used the test dataset and inputted attribute values for each student record. The result from the evaluation process shows the software achieves 98% accuracy, which depicts a high level of dependability. Doctoral
Databáze: OpenAIRE