Action detection and classification in kitchen activities videos using graph decoding.

Autor: Ramadan, Mona, El-Jaroudi, Amro
Předmět:
Zdroj: Visual Computer; Mar2023, Vol. 39 Issue 3, p799-812, 14p
Abstrakt: In this work, we propose a hybrid deep network/graph decoding using hidden Markov model system for the classification of kitchen activities for the Actions for Cooking Eggs data set. We use and compare two deep learning architectures, a deep convolutional neural network (CNN) alone and a long short-term memory network built on top of a CNN. We address the video classification problem both on the level of actions performed in certain frames and the full-length video level. Our proposed system detects a sequence of cooking actions and outputs a menu class for the entire video. Our approach achieves the highest reported accuracy on the data set for identifying cooking actions with an overall accuracy of 81% compared to the state of the art of 76% and succeeds in assigning a menu label to a sequence of cooking actions with an accuracy of 100% compared to an accuracy range of 10–30% reported in previous work. We also explore the effects of processing a subset of the available frames and imposing a state occupancy constraint during decoding. Our best reported results are achieved when using a common-sense dictionary grammar expansion when processing one frame out of every 35 frames and when restricting state transitions for at least five consecutive frames. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index