Estimating Per-Class Statistics for Label Noise Learning.

Autor: Luo W, Chen S, Liu T, Han B, Niu G, Sugiyama M, Tao D, Gong C
Jazyk: angličtina
Zdroj: IEEE transactions on pattern analysis and machine intelligence [IEEE Trans Pattern Anal Mach Intell] 2024 Sep 23; Vol. PP. Date of Electronic Publication: 2024 Sep 23.
DOI: 10.1109/TPAMI.2024.3466182
Abstrakt: Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.
Databáze: MEDLINE