Non-Standard Address Parsing in Chinese Based on Integrated CHTopoNER Model and Dynamic Finite State Machine

Autor: Mengwei Zhang, Xingui Liu, Jingzhen Ma, Zheng Zhang, Yue Qiu, Zhipeng Jiang
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Applied Sciences, Vol 13, Iss 17, p 9855 (2023)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app13179855
Popis: Information in non-standard address texts in Chinese is usually presented with rough content, complex and diverse presentation forms, and inconsistent hierarchical granularity, causing low accuracy in Chinese address parsing. Therefore, we propose a method for parsing non-standard address text in Chinese that integrates the Chinese Toponym Named Entity Recognition (CHTopoNER) model and a dynamic finite state machine (FSM). First, named entity recognition is performed by the CHTopoNER model. Sets of dynamic FSMs are then constructed based on the address hierarchical characteristics to sort and combine the Chinese address elements, thereby achieving address parsing on the Chinese internet. This method showed excellent accuracy in parsing both standard and non-standard placename addresses. In particular, this method performed better in address parsing for disordered or missing hierarchical elements than traditional methods using an FSM. Specifically, this method achieved accuracies of 96.6% and 96.8% for standard and non-standard placenames, respectively. These accuracies increased by 8.0% and 57.1%, respectively, compared with the integrated CHTopoNER model and traditional FSM, and by 7.4% and 19.8%, respectively, compared with the integrated CHTopoNER model and bidirectional FSM. After analysis, the address-parsing method showed good scalability and adaptability, which could be applied to various types of address-parsing tasks.
Databáze: Directory of Open Access Journals