Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition

Autor: Nadeesha Perera, Thi Thuy Linh Nguyen, Matthias Dehmer, Frank Emmert-Streib
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Machine Learning and Knowledge Extraction, Vol 4, Iss 1, Pp 254-275 (2022)
Druh dokumentu: article
ISSN: 2504-4990
DOI: 10.3390/make4010012
Popis: Biomedical Named-Entity Recognition (BioNER) has become an essential part of text mining due to the continuously increasing digital archives of biological and medical articles. While there are many well-performing BioNER tools for entities such as genes, proteins, diseases or species, there is very little research into food and dietary constituent named-entity recognition. For this reason, in this paper, we study seven BioNER models for food and dietary constituents recognition. Specifically, we study a dictionary-based model, a conditional random fields (CRF) model and a new hybrid model, called FooDCoNER (Food and Dietary Constituents Named-Entity Recognition), which we introduce combining the former two models. In addition, we study deep language models including BERT, BioBERT, RoBERTa and ELECTRA. As a result, we find that FooDCoNER does not only lead to the overall best results, comparable with the deep language models, but FooDCoNER is also much more efficient with respect to run time and sample size requirements of the training data. The latter has been identified via the study of learning curves. Overall, our results not only provide a new tool for food and dietary constituent NER but also shed light on the difference between classical machine learning models and recent deep language models.
Databáze: Directory of Open Access Journals