Video Captioning of Future Frames
Autor: | Mehrdad Hosseinzadeh, Yang Wang |
---|---|
Rok vydání: | 2021 |
Předmět: |
Closed captioning
Computer science Event (computing) business.industry 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre Semantics 01 natural sciences Oracle Task (computing) 0202 electrical engineering electronic engineering information engineering Task analysis 020201 artificial intelligence & image processing Artificial intelligence business Baseline (configuration management) computer Sentence 0105 earth and related environmental sciences |
Zdroj: | WACV |
DOI: | 10.1109/wacv48630.2021.00102 |
Popis: | Being able to anticipate and describe what may happen in the future is a fundamental ability for humans. Given a short clip of a scene about "a person is sitting behind a piano", humans can describe what will happen afterward, i.e. "the person is playing the piano". In this paper, we consider the task of captioning future events to assess the performance of intelligent models on anticipation and video description generation tasks simultaneously. More specifically, given only the frames relating to an occurring event (activity), the goal is to generate a sentence describing the most likely next event in the video. We tackle the problem by first predicting the next event in the semantic space of convolutional features, then fusing contextual information into those features, and feeding them to a captioning module. Departing from using recurrent units allows us to train the network in parallel. We compare the proposed method with a baseline and an oracle method on the ActivityNet-Captions dataset. Experimental results demonstrate that the proposed method outperforms the baseline and is comparable to the oracle method. We perform additional ablation study to further analyze our approach. |
Databáze: | OpenAIRE |
Externí odkaz: |