A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution
Autor: | Fouad Slimane, Adel M. Alimi, Rolf Ingold, Slim Kanoun, Jean Hennebert |
---|---|
Rok vydání: | 2013 |
Předmět: |
Arabic
Computer science business.industry Speech recognition Pattern recognition language.human_language Identification (information) Artificial Intelligence Signal Processing Font Word recognition language Computer Vision and Pattern Recognition Artificial intelligence Hidden Markov model business Cursive Software Word (computer architecture) |
Zdroj: | Pattern Recognition Letters. 34:209-218 |
ISSN: | 0167-8655 |
DOI: | 10.1016/j.patrec.2012.09.012 |
Popis: | In this paper, we propose a new font and size identification method for ultra-low resolution Arabic word images using a stochastic approach. The literature has proved the difficulty for Arabic text recognition systems to treat multi-font and multi-size word images. This is due to the variability induced by some font family, in addition to the inherent difficulties of Arabic writing including cursive representation, overlaps and ligatures. This research work proposes an efficient stochastic approach to tackle the problem of font and size recognition. Our method treats a word image with a fixed-length, overlapping sliding window. Each window is represented with a 102 features whose distribution is captured by Gaussian Mixture Models (GMMs). We present three systems: (1) a font recognition system, (2) a size recognition system and (3) a font and size recognition system. We demonstrate the importance of font identification before recognizing the word images with two multi-font Arabic OCRs (cascading and global). The cascading system is about 23% better than the global multi-font system in terms of word recognition rate on the Arabic Printed Text Image (APTI) database which is freely available to the scientific community. |
Databáze: | OpenAIRE |
Externí odkaz: |