Gamma Enhanced Binarization - An Adaptive Nonlinear Enhancement of Degraded Word Images for Improved Recognition of Split Characters

Autor: H. R. Shiva Kumar, A. G. Ramakrishnan
Rok vydání: 2019
Předmět:
Zdroj: NCC
Popis: Recognition performance of any OCR suffers because of the merged and split characters that occur in the scanned images of degraded printed documents. We propose an elegant method of non-linearly enhancing such degraded, gray-scale word images. This connects the broken strokes of the characters, so that binarization of the processed word images gives components with better connectivity for most characters or recognizable units. From an initial value of one, the value of gamma, the parameter determining the enhancement, is decreased in powers of 2 and the right value of gamma is chosen based on the recognition score of our character classifier. We have created a benchmark dataset of 1685 degraded word images obtained from scanned pages of several old Kannada books. The word images have been recognized before and after the proposed nonlinear enhancement. There is an absolute improvement of 14.8% in the Unicode level recognition accuracy of our SVM-based character classifier on the above dataset due to the proposed enhancement of the gray-scale word images. Even on the Google's Tesseract OCR for Kannada, our gamma enhanced binarization results in an improvement of 5.6% in the Unicode level accuracy.
Databáze: OpenAIRE