Egocentric Action Anticipation by Disentangling Encoding and Inference
Autor: | Giovanni Maria Farinella, Antonino Furnari |
---|---|
Rok vydání: | 2019 |
Předmět: |
Action Anticipation
Egocentric Vision EPIC-KITCHENS First Person Vision LSTM Computer science Process (engineering) Optical flow Wearable computer Inference 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Encoding (memory) 0202 electrical engineering electronic engineering information engineering 0105 earth and related environmental sciences Modality (human–computer interaction) business.industry Anticipation Action (philosophy) Anticipation (artificial intelligence) Task analysis 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | ICIP |
Popis: | Egocentric action anticipation consists in predicting future actions from videos collected by means of a wearable camera. Action anticipation methods should be able to continuously 1) summarize the past and 2) predict possible future actions. We observe that action anticipation benefits from explicitly disentangling the two tasks. To this aim, we introduce a learning architecture which makes use of a "rolling" LSTM to continuously summarize the past and an "unrolling" LSTM to anticipate future actions at multiple temporal scales. The model includes a spatial and a temporal branch which process RGB images and optical flow fields independently. The predictions performed by the two branches are fused using a novel modality attention mechanism which leverages the complementary nature of the modalities. Experiments on the EPIC-KITCHENS dataset show that the proposed method surpasses the state-of-the-art by +4.02% and +6.39% when considering Top-1 and Top-5 accuracy respectively. Please see the project webpage at http://iplab.dmi.unict.it/rulstm/. |
Databáze: | OpenAIRE |
Externí odkaz: |