Mitigating shortage of labeled data using clustering-based active learning with diversity exploration

Autor: Yan, Xuyang, Nazmi, Shabnam, Gebru, Biniam, Anwar, Mohd, Homaifar, Abdollah, Sarkar, Mrinmoy, Gupta, Kishor Datta
Rok vydání: 2022
Předmět:
Druh dokumentu: Working Paper
Popis: In this paper, we proposed a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), to address the shortage of labeled data. ALCS employs a density-based clustering approach to explore the cluster structure from the data without requiring exhaustive parameter tuning. A bi-cluster boundary-based sample query procedure is introduced to improve the learning performance for classifying highly overlapped classes. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our experimental results justified the efficacy of the ALCS approach.
Comment: Accepted by the ICML 2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World
Databáze: arXiv