Extended framework for Sindhi numerals OCR using gradient orientation histograms

Autor: Anwar Ali Sanjrani, Junaid Baber, Maheen Bakhtyar, Ihsan Ullah, M. Shumail Naveed, Waheed Noor, Abdul Basit, Azam Khan, Naveed Sheikh
Rok vydání: 2022
Předmět:
Zdroj: Journal of Intelligent & Fuzzy Systems. 43:2045-2056
ISSN: 1875-8967
1064-1246
Popis: The accuracy on MINST dataset for roman numerals is already 99.65%. However, same models showed low accuracy on Sindhi numerals. It is because Sindhi numerals have high correlation between the shapes of the numerals. In this paper, correlation based template matching is used to analyze the shape ambiguity by identifying the dominant false positives (FP) and false negatives (FN) for every numeral. Furthermore, the Gradients Histogram Orientation (GOH) features are used to improve the accuracy of existing classifiers by image-to-image matching. The classical OCR using simple binary features are not sufficient to address the problems of shape ambiguity in Sindhi numerals, i.e., the shape of digits 2, , and 3, , are very similar. The raw pixel values are used as features for the classification in the first stage. In second stage, the input image is matched with the dominant FP and FN of the predicted class, and the final decision is made by the image-to-image matching based on GOH features. Decision based on image to image matching with dominant FP and FN increase the accuracy of the classifier. Support vector machine (SVM), K-nearest neighbor, and template based matching classifiers are used. The proposed extension substantially improves the accuracy of all mentioned classifiers.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje