Using error decay prediction to overcome practical issues of deep active learning for named entity recognition

Autor: Shankar Vembu, Haw-Shiuan Chang, Andrew McCallum, Sunil Mohan, Rheeya Uppaal
Rok vydání: 2020
Předmět:
Zdroj: Machine Learning. 109:1749-1778
ISSN: 1573-0565
0885-6125
DOI: 10.1007/s10994-020-05897-1
Popis: Existing deep active learning algorithms achieve impressive sampling efficiency on natural language processing tasks. However, they exhibit several weaknesses in practice, including (a) inability to use uncertainty sampling with black-box models, (b) lack of robustness to labeling noise, and (c) lack of transparency. In response, we propose a transparent batch active sampling framework by estimating the error decay curves of multiple feature-defined subsets of the data. Experiments on four named entity recognition (NER) tasks demonstrate that the proposed methods significantly outperform diversification-based methods for black-box NER taggers, and can make the sampling process more robust to labeling noise when combined with uncertainty-based methods. Furthermore, the analysis of experimental results sheds light on the weaknesses of different active sampling strategies, and when traditional uncertainty-based or diversification-based methods can be expected to work well.
This is a pre-print of an article published in Springer Machine Learning journal. The final authenticated version is available online at: https://doi.org/10.1007/s10994-020-05897-1
Databáze: OpenAIRE