Bridge inspection named entity recognition via BERT and lexicon augmented machine reading comprehension neural model

Autor: Di Wang, Shixin Jiang, Ren Li, Jianxi Yang, Tianjin Mo, Dong Li
Rok vydání: 2021
Předmět:
Zdroj: Advanced Engineering Informatics. 50:101416
ISSN: 1474-0346
Popis: As an important data source in the field of bridge management, bridge inspection reports contain large-scale fine-grained data, including information on bridge members and structural defects. However, due to insufficient research on automatic information extraction in this field, valuable bridge inspection information has not been fully utilized. Particularly, for Chinese bridge inspection entities, which involve domain-specific vocabularies and have obvious nesting characteristics, most of the existing named entity recognition (NER) solutions are not suitable. To address this problem, this paper proposes a novel lexicon augmented machine reading comprehension-based NER neural model for identifying flat and nested entities from Chinese bridge inspection text. The proposed model uses the bridge inspection text and predefined question queries as input to enhance the ability of contextual feature representation and to integrate prior knowledge. Based on the character-level features encoded by the pre-trained BERT model, bigram embeddings and weighted lexicon features are further combined into a context representation. Then, the bidirectional long short-term memory neural network is used to extract sequence features before predicting the spans of named entities. The proposed model is verified by the Chinese bridge inspection named entity corpus. The experimental results show that the proposed model outperforms other mainstream NER models on the bridge inspection corpus. The proposed model not only provides a basis for automatic bridge inspection information extraction but also supports the downstream tasks such as knowledge graph construction and question answering systems.
Databáze: OpenAIRE