Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework

Autor: Liuchang Xu, Jiajun Zhang, Chengkun Zhang, Xinyu Zheng, Zhenhong Du, Xingyu Xue
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Geo-spatial Information Science, Pp 1-19 (2024)
Druh dokumentu: article
ISSN: 10095020
1993-5153
1009-5020
DOI: 10.1080/10095020.2024.2354229
Popis: In the realm of geospatial services and applications, the accuracy of address information is of utmost importance. Traditional methods of data collection, being both labor-intensive and costly, have prompted researchers to turn to Volunteered Geographic Information (VGI) for the extraction of Geographical Named Entity (GNE).Notwithstanding, prior studies have predominantly concentrated on enhancing extraction accuracy, while often overlooking the critical aspect of GNE quality. This study addresses this gap by employing a multifaceted approach. Initially, a Geographical Named Entity Semantic Model (GNESM) was constructed by improving the BERT framework and conducting ablation experiments on multiple influencing factors to verify its feasibility. Based on GNESM, a Geographical Named Entity Recognition Model (GNERM) was constructed by incremental pre-training with social media text data and fine-tuning to achieve a recognition accuracy of 90.9%. Subsequently, a Geographical Named Entity Error Correction Model (GNEECM) was constructed by training GNESM with standard GNE data and incorporating error detection and correction modules, achieving a remarkable accuracy of 96.6% in error detection and correction tasks. The experimental results convincingly demonstrate that the proposed identification and correction methods outperform all compared methods. Through the identification and correction process, this study successfully obtained high-quality GNE data, providing a reference for expanding standard address libraries and subsequent research on geographic named entity.
Databáze: Directory of Open Access Journals