A system for detection of moving caption text in videos: a news use case

Autor: Mohsen A. Rashwan, Hossam Elshahaby
Rok vydání: 2021
Předmět:
Zdroj: Multimedia Tools and Applications. 80:25607-25631
ISSN: 1573-7721
1380-7501
Popis: Extraction of news text captions aims at a digital understanding of what is happening in a specific region during a certain period that helps in better communication between different nations because we can easily translate the plain text from one language to another. Moving text captions causes blurry effects that are a significant cause of text quality impairments in the news channels. Most of the existing text caption detection models do not address this problem in a way that captures the different dynamic motion of captions, gathers a full news story among several frames in the sequence, resolves the blurring effect of text motion, offers a language-independent model, or provides it as an end-to-end solution for the community to use. We process the frames coming in sequence and extract edge features using either the Hough transform or our color-based technique. We verify text existence using a Convolutional Neural Network (CNN) text detection pre-trained model. We analyze the caption motion status using hybrid pre-trained Recurrent Neural Network (RNN) of Long Short-Term Memory (LSTM) type model and correlation-based model. In case the motion is determined to be horizontal rotation, there are two problems. First, it means that text keeps rotating with no stop resulting in a high blurring effect that affects the text quality and consequently resulting in low character recognition accuracy. Second, there are successive news stories which are separated by the channel logo or long spaces. We managed to solve the first problem by deblurring the text image using either Bicubic Spline Interpolation (BSI) technique or the Denoising Autoencoder Neural Network (DANN). We solved the second problem using a Point Feature Matching (PFM) technique to match the existing channel logo with the channels’ logo database (ground truth). We evaluate our framework using Abbyy® SDK as a standalone tool used for text recognition supporting different languages.
Databáze: OpenAIRE