Active Learning-Based Approach for Named Entity Recognition on Short Text Streams

Autor: Ngoc Thanh Nguyen, Tuong Tri Nguyen, Dinh Tuyen Hoang, Dosam Hwang, Cuong Van Tran
Rok vydání: 2016
Předmět:
Zdroj: Advances in Intelligent Systems and Computing ISBN: 9783319439815
MISSI
Popis: The named entity recognition (NER) problem has an important role in many natural language processing (NLP) applications and is one of the fundamental tasks for building NLP systems. Supervised learning methods can achieve high performance but they require a large amount of training data that is time-consuming and expensive to obtain. Active learning (AL) is well-suited to many problems in NLP, where unlabeled data may be abundant but labeled data is limited. The AL method aims to minimize annotation costs while maximizing the desired performance from the model. This study proposes a method to classify named entities from Tweet streams on Twitter by using an AL method with different query strategies. The samples were queried for labeling by human annotators based on query by committee and diversity-based querying. The experiments evaluated the proposed method on Tweet data and achieved promising results that proved better than the baseline.
Databáze: OpenAIRE