Handling imbalanced data in supervised machine learning for lithological mapping using remote sensing and airborne geophysical data

Autor: Nugroho Hary, Wikantika Ketut, Bijaksana Satria, Saepuloh Asep
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Open Geosciences, Vol 15, Iss 1, Pp 1-16 (2023)
Druh dokumentu: article
ISSN: 2391-5447
DOI: 10.1515/geo-2022-0487
Popis: With balanced training sample (TS) data, learning algorithms offer good results in lithology classification. Meanwhile, unprecedented lithological mapping in remote places is predicted to be difficult, resulting in limited and unbalanced samples. To address this issue, we can use a variety of techniques, including ensemble learning (such as random forest [RF]), over/undersampling, class weight tuning, and hybrid approaches. This work investigates and analyses many strategies for dealing with imbalanced data in lithological classification based on RF algorithms with limited drill log samples using remote sensing and airborne geophysical data. The research was carried out at Komopa, Paniai District, Papua Province, Indonesia. The class weight tuning, oversampling, and balance class weight procedures were used, with TSs ranging from 25 to 500. The oversampling approach outperformed the class weight tuning and balance class weight procedures in general, with the following metric values: 0.70–0.80 (testing accuracy), 0.43–0.56 (F1 score), and 0.32–0.59 (Kappa score). The visual comparison also revealed that the oversampling strategy gave the most reliable classifications: if the imbalance ratio is proportionate to the coverage area in each lithology class, the classifier capability is optimal.
Databáze: Directory of Open Access Journals