Popis: |
While classification performance has improved with the adoption of Neural Network models, the cost of acquiring and labeling the data required to outperform other classification methods is often prohibitively high. Semi-Supervised learning attempts to incorporate unlabeled data in the learning process which can improve performance, however such methods assume preexisting, static sets of labeled and unlabeled data, which are often difficult to attain for novel problems. Active learning addresses these problems by determining which unlabeled samples will, when labeled, best improve a supervised model's performance. Existing methods to prioritize samples have primarily been considered in isolation, despite the existing Information Density framework to combine these methods together. We employ this framework to combine the current state of the art uncertainty based method with a novel similarity based method to improve performance. We also extend the framework itself by considering a dynamic combination of these two methods that shifts priority from one to the other. This iterative process of increasing the labeled set with data prioritized by our acquisition function enables the creation of powerful classification models at greatly reduced costs. |