Deep semantic binarization for document images.

Autor: Mondal, Ajoy, Reddy, Chetan, Jawahar, C. V.
Předmět:
Zdroj: Multimedia Tools & Applications; Feb2023, Vol. 82 Issue 5, p6531-6555, 25p
Abstrakt: Binarization is an essential pre-processing step for many document image analysis tasks. Binarization of handwritten documents is more challenging than printed documents because of the non-uniform density of ink and the variable thickness of strokes. Instead of traditional scanners, people nowadays use the mobile camera to capture documents, including text written on white and glass boards. The quality of the camera-captured document images is often poor when compared with scanned document images. This impacts binarization accuracy. This paper presents a deep learning-based binarization framework called Deep Semantic Binarization (dsb) to binarize various document images. We pose document image binarization problem as a pixel-wise two-class classification task. Deep networks (including dsb) require many training images during training. However, the benchmark datasets with a limited number of training images are publicly available in the literature. We explore various training strategies, including transfer learning, to handle the data scarcity during training. Due to the unavailability of mobile-captured whiteboard and glass board images, we created two datasets, namely wbids-iiit and gbids-iiit, with associated ground truths. We validate dsbn on the public benchmark dibco dataset and wbids-iiit and gbids-iiit datasets. We empirically demonstrate that the dsb outperforms the state-of-the-art techniques for wbids-iiit, gbids-iiit and public datasets. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index