Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition.

Autor: Inunganbi, Sanasam, Choudhary, Prakash, Manglem, Khumanthem
Předmět:
Zdroj: Visual Computer; Feb2021, Vol. 37 Issue 2, p291-305, 15p
Abstrakt: A peculiar Indian Script Meitei Mayek has experienced a resurgence in the last few years and gets very little attention in handwriting research due to recently insurgence and limited sources. The objective of this paper is two folds; firstly, develop two different datasets: Mayek27 having 4900 isolated Meitei Mayek alphabets and MM (Meitei Mayek) dataset of 189 full-length handwritten text page. Secondly, develop a recognition system on the Mayek27 dataset using convolutional neural network and segmentation algorithms (text-lines, words, and characters) on the full-length Meitei Mayek handwritten text. A recognition rate of 99.02 % is achieved using three layers of convolutional layers with a filter size of 3 × 3 with 16, 32, and 96 kernels. In MM text dataset, the text-line and word segmentation are performed concurrently on 809 lines by tracking space between lines in a novel approach based on horizontal projection histogram and monitoring vertical projection histogram along the run-length of segmentation. Various constraints like skew, curve, close, and touching text-lines are incorporated, and the segmentation algorithm results are 91.84% and 88.96% for text-line and word, respectively. Furthermore, characters are segmented by headline removal, and connected component analysis achieves an accuracy of 91.12%. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index