Research on a Web System Data-Filling Method Based on Optical Character Recognition and Multi-Text Similarity

Autor: Hailu Su, Ruiqing Kang, Yunli Fan
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Applied Sciences, Vol 14, Iss 3, p 1034 (2024)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app14031034
Popis: In the development of web systems, data uploading is a relatively important function. The traditional method of uploading data is to manually fill out forms, but when the data to be uploaded mostly exist in the form of form images, and the form content contains a lot of similar field information and irrelevant edge information, using traditional methods is not only time-consuming and labor-intensive, but also prone to errors. This requires a technology that can automatically fill in complex form images. OCR is an optical character recognition technology that can convert images into digitized text data using computer vision methods. However, using this technology alone cannot complete the tasks of extracting relevant data and filling corresponding fields. To address this issue, this article proposes a method that combines OCR technology and Levenshtein multi-text similarity. This method can effectively solve the problem of data filling after parsing complex form images, and the application results of this method in web systems show that the filling accuracy for complex form images can reach over 90%.
Databáze: Directory of Open Access Journals