Dominant Color Segmentation of Administrative Document Images by Hierarchical Clustering

Autor: Jean-Christophe Burie, Jean-Marc Ogier, Elodie Carel, Vincent Courboulay
Přispěvatelé: Courboulay, Vincent, Laboratoire Informatique, Image et Interaction - EA 2118 (L3I), Université de La Rochelle (ULR)
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Zdroj: ACM Symposium on Document Engineering
13th ACM Symposium on Document Engineering (DocEng)
13th ACM Symposium on Document Engineering (DocEng), Sep 2013, Florence, Italy
Popis: This paper addresses the problem of color documents images segmentation in an industrial context. Automated Document Recognition (ADR) systems highly reduce time and resource costs of companies by managing their huge amount of administrative documents, and by optimizing their workflow. Most of the time, a binarization is performed due to their historical industrial process. Therefore, colorimetric information can improve the process. In this paper, we propose a hierarchical clustering based approach to extract dominant color masks of documents. Indeed, our dataset comprises different kind of scanned administrative document images such as invoices, forms, letters, and so on. We do not know a priori the number of dominant colors on our documents. These masks will further feed the inputs to an OCR in order to bring extra-information about the colorimetric context. This approach requires neither user interaction nor setting steps. Experiments on several types of documents show the relevance of the proposed approach
Databáze: OpenAIRE