A Generalized Approach to Determine Confident Samples for Deep Neural Networks on Unseen Data

Autor: Kevin H. Leung, Gopal B. Avinash, Ma Zili, Min Zhang, Jin Wen
Rok vydání: 2019
Předmět:
Zdroj: Uncertainty for Safe Utilization of Machine Learning in Medical Imaging and Clinical Image-Based Procedures ISBN: 9783030326883
UNSURE/CLIP@MICCAI
DOI: 10.1007/978-3-030-32689-0_7
Popis: Deep neural network (DNN) models are widely applied in biomedical image studies since DNN models take advantage of massive data to provide improved performance over traditional machine learning models. However, like any other data-driven models, DNN models still face generalization limitations. For example, a model trained on clinical data from one hospital may not perform as well on data from another hospital. In this work, a novel approach is proposed to determine confident samples from unseen data on which a DNN model will have improved performance. Confident samples are defined as inliers identified by an outlier detector, which is based on projection of training data onto a standard feature space (e.g. ImageNet feature space). The hypothesis of the proposed method is that in a standard feature space, a DNN model will perform better on the inlier data samples and more poorly on the outliers. While projecting the unseen data to a standard feature space, if data points are detected as inliers, then the model will likely have consistent performance on those inliers as those patterns have already been “seen” from the training dataset. To validate our hypothesis, experiments were conducted using publicly available digit image datasets and chest X-ray images from three unseen datasets collected across U.S. and Canada hospitals. The experimental results showed consistently improved performance across various DNN models on all confident samples from unseen datasets.
Databáze: OpenAIRE