The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

Autor: Salvador España-Boquera, Maria Jose Castro-Bleda, Joan Pastor-Pellicer, Francisco Zamora-Martínez
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
instname
DOI: 10.1093/comjnl/bxz098
Popis: [EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.
This research was undertaken as part of the project TIN2017-85854-C4-2-R, jointly funded by the Spanish MINECO and FEDER founds.
Databáze: OpenAIRE