Context Aware Group Activity Recognition

Autor:	Avijit Dasgupta, C. V. Jawahar, Karteek Alahari
Přispěvatelé:	Center for Visual Information Technology [Hyderabad] (CVIT), International Institute of Information Technology, Hyderabad [Hyderabad] (IIIT-H), Apprentissage de modèles à partir de données massives (Thoth), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), ANR-18-CE23-0011, ANR-18-CE23-0011,AVENUE,Réseau de mémoire visuelle pour l'interprétation de scènes(2018)
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Computer science business.industry Representation (systemics) [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] Context (language use) 02 engineering and technology 010501 environmental sciences 01 natural sciences Focus (linguistics) Task (project management) Human–computer interaction Group activity recognition 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Feature learning 0105 earth and related environmental sciences
Zdroj:	ICPR 2020-International Conference on Pattern Recognition ICPR 2020-International Conference on Pattern Recognition, Jan 2021, Milan (Virtual), Italy. pp.10098-10105, ⟨10.1109/ICPR48806.2021.9412306⟩ ICPR
Popis:	International audience; This paper addresses the task of group activity recognition in multi-person videos. Existing approaches decompose this task into feature learning and relational reasoning. Despite showing progress, these methods only rely on appearance features for people and overlook the available contextual information, which can play an important role in group activity understanding. In this work, we focus on the feature learning aspect and propose a two-stream architecture that not only considers person-level appearance features, but also makes use of contextual information present in videos for group activity recognition. In particular, we propose to use two types of contextual information beneficial for two different scenarios: pose context and scene context that provide crucial cues for group activity understanding. We combine appearance and contextual features to encode each person with an enriched representation. Finally, these combined features are used in relational reasoning for predicting group activities. We evaluate our method ontwo benchmarks, Volleyball and Collective Activity and show that joint modeling of contextual information with appearance features benefits in group activity understanding.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::db4449e3d1ad517224aee8f89da94f5b https://hal.archives-ouvertes.fr/hal-02987414/document Zobrazit plný text záznamu