A system for detection of moving caption text in videos: a news use case

Autor:	Mohsen A. Rashwan, Hossam Elshahaby
Rok vydání:	2021
Předmět:	Deblurring Artificial neural network Point (typography) Computer Networks and Communications Plain text business.industry Computer science Interpolation (computer graphics) ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 020207 software engineering 02 engineering and technology computer.file_format Convolutional neural network Hough transform law.invention Recurrent neural network Hardware and Architecture law 0202 electrical engineering electronic engineering information engineering Media Technology Computer vision Artificial intelligence business computer Software
Zdroj:	Multimedia Tools and Applications. 80:25607-25631
ISSN:	1573-7721 1380-7501
Popis:	Extraction of news text captions aims at a digital understanding of what is happening in a specific region during a certain period that helps in better communication between different nations because we can easily translate the plain text from one language to another. Moving text captions causes blurry effects that are a significant cause of text quality impairments in the news channels. Most of the existing text caption detection models do not address this problem in a way that captures the different dynamic motion of captions, gathers a full news story among several frames in the sequence, resolves the blurring effect of text motion, offers a language-independent model, or provides it as an end-to-end solution for the community to use. We process the frames coming in sequence and extract edge features using either the Hough transform or our color-based technique. We verify text existence using a Convolutional Neural Network (CNN) text detection pre-trained model. We analyze the caption motion status using hybrid pre-trained Recurrent Neural Network (RNN) of Long Short-Term Memory (LSTM) type model and correlation-based model. In case the motion is determined to be horizontal rotation, there are two problems. First, it means that text keeps rotating with no stop resulting in a high blurring effect that affects the text quality and consequently resulting in low character recognition accuracy. Second, there are successive news stories which are separated by the channel logo or long spaces. We managed to solve the first problem by deblurring the text image using either Bicubic Spline Interpolation (BSI) technique or the Denoising Autoencoder Neural Network (DANN). We solved the second problem using a Point Feature Matching (PFM) technique to match the existing channel logo with the channels’ logo database (ground truth). We evaluate our framework using Abbyy® SDK as a standalone tool used for text recognition supporting different languages.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b117ee0915710691e308cbd07f0d3448 https://doi.org/10.1007/s11042-021-10856-6 Zobrazit plný text záznamu Full text from SpringerLink