Beyond OCRs for Document Blur Estimation

Autor: Sajal Maheshwari, Pranjal Kumar Rai, Vineet Gandhi, Parikshit Sakurikar, Ishit Mehta
Rok vydání: 2017
Předmět:
Zdroj: ICDAR
DOI: 10.1109/icdar.2017.182
Popis: The current document blur/quality estimation algorithms rely on the OCR accuracy to measure their success. A sharp document image, however, at times may yield lower OCR accuracy owing to factors independent of blur or quality of capture. The necessity to rely on OCR is mainly due to the difficulty in quantifying the quality otherwise. In this work, we overcome this limitation by proposing a novel dataset for document blur estimation, for which we physically quantify the blur using a capture set-up which computationally varies the focal distance of the camera. We also present a selective search mechanism to improve upon the recently successful patch-based learning approaches (using codebooks or convolutional neural networks). We present a thorough analysis of the improved blur estimation pipeline using correlation with OCR accuracy as well as the actual amount of blur. Our experiments demonstrate that our method outperforms the current state-of-the-art by a significant margin.
Databáze: OpenAIRE