A system for detection of moving caption text in videos: a news use case
Autor: | Mohsen A. Rashwan, Hossam Elshahaby |
---|---|
Rok vydání: | 2021 |
Předmět: |
Deblurring
Artificial neural network Point (typography) Computer Networks and Communications Plain text business.industry Computer science Interpolation (computer graphics) ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 020207 software engineering 02 engineering and technology computer.file_format Convolutional neural network Hough transform law.invention Recurrent neural network Hardware and Architecture law 0202 electrical engineering electronic engineering information engineering Media Technology Computer vision Artificial intelligence business computer Software |
Zdroj: | Multimedia Tools and Applications. 80:25607-25631 |
ISSN: | 1573-7721 1380-7501 |
Popis: | Extraction of news text captions aims at a digital understanding of what is happening in a specific region during a certain period that helps in better communication between different nations because we can easily translate the plain text from one language to another. Moving text captions causes blurry effects that are a significant cause of text quality impairments in the news channels. Most of the existing text caption detection models do not address this problem in a way that captures the different dynamic motion of captions, gathers a full news story among several frames in the sequence, resolves the blurring effect of text motion, offers a language-independent model, or provides it as an end-to-end solution for the community to use. We process the frames coming in sequence and extract edge features using either the Hough transform or our color-based technique. We verify text existence using a Convolutional Neural Network (CNN) text detection pre-trained model. We analyze the caption motion status using hybrid pre-trained Recurrent Neural Network (RNN) of Long Short-Term Memory (LSTM) type model and correlation-based model. In case the motion is determined to be horizontal rotation, there are two problems. First, it means that text keeps rotating with no stop resulting in a high blurring effect that affects the text quality and consequently resulting in low character recognition accuracy. Second, there are successive news stories which are separated by the channel logo or long spaces. We managed to solve the first problem by deblurring the text image using either Bicubic Spline Interpolation (BSI) technique or the Denoising Autoencoder Neural Network (DANN). We solved the second problem using a Point Feature Matching (PFM) technique to match the existing channel logo with the channels’ logo database (ground truth). We evaluate our framework using Abbyy® SDK as a standalone tool used for text recognition supporting different languages. |
Databáze: | OpenAIRE |
Externí odkaz: |