The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing
Autor: | Salvador España-Boquera, Maria Jose Castro-Bleda, Joan Pastor-Pellicer, Francisco Zamora-Martínez |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
General Computer Science
Computer science Noise reduction ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Image processing 02 engineering and technology computer.software_genre Binarization 01 natural sciences 010309 optics 0103 physical sciences Machine learning 0202 electrical engineering electronic engineering information engineering Denoising Artificial neural network business.industry Deep learning Pattern recognition Optical character recognition Superresolution Super resolution ComputingMethodologies_DOCUMENTANDTEXTPROCESSING 020201 artificial intelligence & image processing Artificial intelligence business computer LENGUAJES Y SISTEMAS INFORMATICOS Neural networks |
Zdroj: | RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia instname |
DOI: | 10.1093/comjnl/bxz098 |
Popis: | [EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems. This research was undertaken as part of the project TIN2017-85854-C4-2-R, jointly funded by the Spanish MINECO and FEDER founds. |
Databáze: | OpenAIRE |
Externí odkaz: |