Popis: |
Modelling and prediction of spatially distributed data such as the secondary cassiterite mineral distributions are often affected by spatial autocorrelation (SAC); a phenomenon that violates attributes data independence in space, which leads to type1 errors in classical statistics and overfitting or underfitting in machine learning (ML) classification respectively. The concept of overfitting and underfitting of spatially distributed datasets in an ML classification has not been properly addressed by the traditional random holdout technique of model validation, and this is a challenge to the assessment of predictive spatial model performance in spatially distributed datasets. The thesis presents an approach to predictive modelling and performance evaluation of spatially distributed secondary mineral dataset, represented as points, using supervised machine learning (ML) classification. The work involves a systematic geological data survey of the existing mineral location coordinate points and other mineralisation attributes, in the Plateau Younger Granite Region (PYGR) of Nigeria. The predictive characteristics or values are extracted from a 2D space of discrete coordinate points using GIS into an ML acceptable format, consisting of 749 by 21 dimension (i.e., observational data points by the predictive attributes), with two classes of 0 & 1 representing mineralised and non-mineralised points respectively. The attributes describing the secondary mineral formation were used to build a point based predictive spatial model for mineral potential mapping (PSM-MPM) and using random holdout validation technique to assess its performance. The thesis conducted predictive performance evaluation of the PSM-MPM to overfitting and underfitting by proposing a novel validation technique of spatial strip splitting (SSS) that spatially splits predictive data into training and testing; the proposed method reveals the detrimental effect of both the overfitting and underfitting associated with the conventional ML classification model validation of random holdout (RHO) or cross validation. The work also carried out a comparative analysis of PSM-MPM performance that involves the trio performance evaluation techniques which include: attributes data preprocessing technique using principal component analysis (PCA); PCA-RHO with preprocessing, that selects the best attribute subsets, the RHO without preprocessing and the novel SSS validation technique. The result showed that the SSS technique is the ideal method of assessing PSM-MPM performance because it shows clearly the detrimental effects of both overfitting and underfitting and provides more informative performance results when implementing PSM-MPM. |