Key Information Extraction and Recognition from Rich Text Images

Autor:	Tien Do, Thuyen Tran Doan, Khiem Le, Thua Nguyen, Duy-Dinh Le, Thanh Duc Ngo
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Document information extraction and localization (KIE and KILE) line item recognition (LIR) Information technology T58.5-58.64 Electronic computers. Computer science QA75.5-76.95
Zdroj:	Vietnam Journal of Computer Science, Vol 11, Iss 04, Pp 569-594 (2024)
Druh dokumentu:	article
ISSN:	21968888 2196-8896 2196-8888
DOI:	10.1142/S2196888824500131
Popis:	Key information extraction and recognition from rich text images are crucial for various applications. There are two main tasks involved in this process: Line Item Recognition (LIR) and Key Information Localization and Extraction (KILE). LIR aims at identifying and interpreting data line items in a document. The essential information in each line item is then classified or extracted, a task known as KILE. A widely used approach for this problem is sequence based, which relies on the generalization of a language model and requires a significant amount of training time. We present an effective and reliable solution to the problem by using RoBERTa, a transformer model trained on a large corpus, along with the LION optimizer to improve the training process. A comprehensive evaluation was conducted on two different benchmarks, emphasizing two different languages, English and Vietnamese. Experimental results on DocILE indicate that the proposed framework significantly improves the KILE task with a 7.24% increase in accuracy compared to the baseline and also enhances the correct recognition rate at the LIR stage. On MCOCR, the method achieved a Character Error Rate (CER) of 28.6%, which is competitive with the state-of-the-art on this dataset.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/0e9b2df7adef4ee398c0ad3e324171cc Zobrazit plný text záznamu View record in DOAJ