OCR of Printed Telugu Text with High Recognition Accuracies.

Autor: Kalra, Prem, Peleg, Shmuel, Vasantha Lakshmi, C., Jain, Ritu, Patvardhan, C.
Zdroj: Computer Vision, Graphics & Image Processing; 2006, p786-795, 10p
Abstrakt: Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research. OCR of Indian scripts is much more complicated than the OCR of Roman script because of the use of huge number of combinations of characters and modifiers. Basic Symbols are identified as the unit of recognition in Telugu script. Edge Histograms are used for a feature based recognition scheme for these basic symbols. During recognition, it is observed that, in many cases, the recognizer incorrectly outputs a very similar looking symbol. Special logic and algorithms are developed using simple structural features for improving recognition accuracies considerably without too much additional computational effort. It is shown that recognition accuracies of 98.5 % can be achieved on laser quality prints with such a procedure. [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index