Resilient Learning of Computational Models With Noisy Labels

Autor: Yongcan Cao, Feng Tao
Rok vydání: 2021
Předmět:
Zdroj: IEEE Transactions on Emerging Topics in Computational Intelligence. 5:351-360
ISSN: 2471-285X
DOI: 10.1109/tetci.2019.2917704
Popis: The resistance of computational models against label noise offers promising potential for the correction of erroneous labels. One intuitive way is to re-label the data samples based on the model's prediction when the original label error rate is relatively high. However, directly flipping labels to the model's prediction may not improve the quality of datasets due to the concentration of noisy labels at small regions, i.e., increasing noise condensity, in the process of label flipping. Given the same label accuracy, datasets with condensed noise lead to (much) worse learning models than those without condensed noise. Hence, the quality of dataset may not benefit from this correction process. Moreover, iteratively flipping the label typically leads to the decreasing of label accuracy rather than a stabilized error rate around which the error rate slowly oscillates at each subsequent iteration. In this paper, we propose a novel method that simultaneously reduces the label error rate and improves the quality of datasets (via reduction of noise condensity). In contrast to the existing methods that either involve humans in the label correction process or construct multiple models to obtain a consensus opinion, our proposed method is simple and can automatically improve the quality of datasets. Specifically, we propose the use of a small clean dataset to evaluate overfitting caused by the concentration of label noise. Once the noise condensity issue is detected, the label flipping process will be modified by adding a statistical probability into the flipping procedure. The proposed method is verified on noisy MNIST and CIFAR-10 datasets. Label correction results are presented and the prediction accuracies of the neural network model trained on the corrected datasets are compared with results from the methods that target for learning from noisy labels.
Databáze: OpenAIRE