Multi-Script Video Caption Localization Based on Visual Rhythms.

Autor: Souza, Marcos Roberto, de Almeida Maia, Helena, Souza Santos, Anderson Carlos, Vieira, Marcelo Bernardes, Pedrini, Helio
Předmět:
Zdroj: Applied Artificial Intelligence; 2022, Vol. 36 Issue 1, p1-32, 32p
Abstrakt: Localization of video caption plays an important role in information retrieval in multimedia applications. In this work, we present and evaluate a novel method for localizing video captions using visual rhythms, which enable the representation and analysis of a specific feature throughout the time. We build visual rhythms from the text location maps produced by general text localization methods that are far more common in the literature than caption-oriented ones. Then, we process the maps properly to keep only the captions, generating caption localization masks. To meet the need for a standardized and large dataset, we constructed a new one, where captions with thirteen different scripts are added to the video frames, generating a total of 221 videos with ground truth. Experiments demonstrate that our method achieves competitive results when compared to other literature approaches. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index