On the Influence of Data Imbalance on Supervised Gaussian Mixture Models

Autor: Luca Scrucca
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Algorithms, Vol 16, Iss 12, p 563 (2023)
Druh dokumentu: article
ISSN: 1999-4893
DOI: 10.3390/a16120563
Popis: Imbalanced data present a pervasive challenge in many real-world applications of statistical and machine learning, where the instances of one class significantly outnumber those of the other. This paper examines the impact of class imbalance on the performance of Gaussian mixture models in classification tasks and establishes the need for a strategy to reduce the adverse effects of imbalanced data on the accuracy and reliability of classification outcomes. We explore various strategies to address this problem, including cost-sensitive learning, threshold adjustments, and sampling-based techniques. Through extensive experiments on synthetic and real-world datasets, we evaluate the effectiveness of these methods. Our findings emphasize the need for effective mitigation strategies for class imbalance in supervised Gaussian mixtures, offering valuable insights for practitioners and researchers in improving classification outcomes.
Databáze: Directory of Open Access Journals