Performance evaluation of different features and classifiers for Gurumukhi newspaper text recognition.

Autor: Kaur, Rupinder Pal, Kumar, Munish, Jindal, M. K.
Zdroj: Journal of Ambient Intelligence & Humanized Computing; Aug2023, Vol. 14 Issue 8, p10245-10261, 17p
Abstrakt: Document analysis is always a great area of enthusiasm for the researchers who are keen to innovate new techniques to archive the important information printed or handwritten on various documents. Newspapers are one of the sources that comprise of chronicled as well analytical data. Archiving such kind of information, through optical character recognition (OCR), can benefit us in future. In OCR, printed or handwritten text in consideration process through many phases to extract the recognizable unit. Finally, characters are recognized to generate a computer processable form of text. Feature extraction and classification phases are significant stages in which features of a segmented character image are extracted and fed to classifier for identification. In this presented work, various feature extraction and classification techniques have been implemented to recognize newspaper text printed in Gurumukhi script. Six types of feature extraction techniques namely Zoning, Diagonal, Centroid, power curve fitting, parabola curve fitting and peak extent method have been used for extracting features from a character image. Four classifiers namely k-nearest neighbor, multilayer perceptron, Decision tree and random forest classifier have been explored for classification purpose. Feature extraction techniques and classifiers are evaluated based on obtained recognition results. Maximum recognition accuracy 96.9% has been obtained using diagonal features with random forest classifier. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index