A novel pipeline framework for multi oriented scene text image detection and recognition
Autor: | Vahid Ghods, Fatemeh Naiemi, Hassan Khalesi |
---|---|
Rok vydání: | 2021 |
Předmět: |
0209 industrial biotechnology
Pixel Computer science business.industry General Engineering Process (computing) Pattern recognition 02 engineering and technology Pipeline (software) Convolutional neural network Computer Science Applications 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Word (computer architecture) |
Zdroj: | Expert Systems with Applications. 170:114549 |
ISSN: | 0957-4174 |
Popis: | Automatic text detection and recognition (end-to-end text recognition) in real-life images are the main elements of many applications including blind and low vision assistance systems and self-driving cars. However, it is challenging to detect curved and vertical texts due to their color bleeding, font size variation, and complicated background. In this paper, a convolutional neural network-based pipeline is introduced to obtain high-level visual features and improve text detection and recognition efficiency. A pre-trained ResNet-50 network on ImageNet and SynthText for extracting low-level visual features was used in this study. Moreover, new improved ReLU layer (new.i.ReLU) blocks are used with a varied receptive field with a strong ability to detect text components even on curved surfaces in the proposed structure. A new improved inception layer (new.i.inception layers) can obtain broadly varying-sized text more effectively than a linear chain of convolution layer. Also, we have proposed a pipeline framework for character recognition that is robust to irregular (curve and vertical) text. First, we introduced a novel algorithm for encoding pixel’s value to a new one called local word directional pattern (LWDP) that highlights the texture of the characters. Then, the output of LWDP was presented as an input image in the text recognition process. The experiments on standard benchmarks, including ICDAR 2013, ICDAR 2015, and ICDAR 2019 datasets, illustrated the superiority of the proposed architecture over prior works. |
Databáze: | OpenAIRE |
Externí odkaz: |