Gamma Enhanced Binarization - An Adaptive Nonlinear Enhancement of Degraded Word Images for Improved Recognition of Split Characters
Autor: | H. R. Shiva Kumar, A. G. Ramakrishnan |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
business.industry ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Pattern recognition 02 engineering and technology 01 natural sciences Unicode language.human_language 010309 optics Kannada Support vector machine Nonlinear system ComputingMethodologies_PATTERNRECOGNITION 0103 physical sciences ComputingMethodologies_DOCUMENTANDTEXTPROCESSING 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Tesseract Artificial intelligence business Classifier (UML) |
Zdroj: | NCC |
Popis: | Recognition performance of any OCR suffers because of the merged and split characters that occur in the scanned images of degraded printed documents. We propose an elegant method of non-linearly enhancing such degraded, gray-scale word images. This connects the broken strokes of the characters, so that binarization of the processed word images gives components with better connectivity for most characters or recognizable units. From an initial value of one, the value of gamma, the parameter determining the enhancement, is decreased in powers of 2 and the right value of gamma is chosen based on the recognition score of our character classifier. We have created a benchmark dataset of 1685 degraded word images obtained from scanned pages of several old Kannada books. The word images have been recognized before and after the proposed nonlinear enhancement. There is an absolute improvement of 14.8% in the Unicode level recognition accuracy of our SVM-based character classifier on the above dataset due to the proposed enhancement of the gray-scale word images. Even on the Google's Tesseract OCR for Kannada, our gamma enhanced binarization results in an improvement of 5.6% in the Unicode level accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |