Active Learning-Based Approach for Named Entity Recognition on Short Text Streams
Autor: | Ngoc Thanh Nguyen, Tuong Tri Nguyen, Dinh Tuyen Hoang, Dosam Hwang, Cuong Van Tran |
---|---|
Rok vydání: | 2016 |
Předmět: |
Training set
Active learning (machine learning) business.industry Computer science Supervised learning Pattern recognition 02 engineering and technology computer.software_genre Annotation Named-entity recognition 020204 information systems 0202 electrical engineering electronic engineering information engineering Labeled data 020201 artificial intelligence & image processing Artificial intelligence Baseline (configuration management) business computer Natural language processing |
Zdroj: | Advances in Intelligent Systems and Computing ISBN: 9783319439815 MISSI |
Popis: | The named entity recognition (NER) problem has an important role in many natural language processing (NLP) applications and is one of the fundamental tasks for building NLP systems. Supervised learning methods can achieve high performance but they require a large amount of training data that is time-consuming and expensive to obtain. Active learning (AL) is well-suited to many problems in NLP, where unlabeled data may be abundant but labeled data is limited. The AL method aims to minimize annotation costs while maximizing the desired performance from the model. This study proposes a method to classify named entities from Tweet streams on Twitter by using an AL method with different query strategies. The samples were queried for labeling by human annotators based on query by committee and diversity-based querying. The experiments evaluated the proposed method on Tweet data and achieved promising results that proved better than the baseline. |
Databáze: | OpenAIRE |
Externí odkaz: |