Khmer printed character recognition using attention-based Seq2Seq network

Autor: Rina Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Ho Chi Minh City Open University Journal of Science - Engineering and Technology, Vol 12, Iss 1, Pp 3-16 (2022)
Druh dokumentu: article
ISSN: 2734-9330
2734-9608
DOI: 10.46223/HCMCOUJS.tech.en.12.1.2217.2022
Popis: This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.
Databáze: Directory of Open Access Journals