Fast Collective Activity Recognition Under Weak Supervision
Autor: | Wei-Shi Zheng, Jian-Fang Hu, Peizhen Zhang, Yongyi Tang |
---|---|
Rok vydání: | 2020 |
Předmět: |
business.industry
Computer science Deep learning Feature extraction Supervised learning Inference 02 engineering and technology Machine learning computer.software_genre Computer Graphics and Computer-Aided Design Social group Activity recognition Action (philosophy) 0202 electrical engineering electronic engineering information engineering Task analysis Embedding 020201 artificial intelligence & image processing Artificial intelligence business computer Software |
Zdroj: | IEEE Transactions on Image Processing. 29:29-43 |
ISSN: | 1941-0042 1057-7149 |
DOI: | 10.1109/tip.2019.2918725 |
Popis: | Collective activity recognition, which tells what activity a group of people is performing, is a cutting-edge research topic in computer vision. Different from action performed by individuals, collective activity needs to consider the complex interactions among different people. However, most previous works require exhaustive annotations such as accurate label information of individual actions, pairwise interactions, and poses, which could not be easily available in practice. Moreover, most of them treat human detection as a decoupled task before collective activity recognition and leverage all detected persons. This not only ignores the mutual relation between the two tasks, which makes it hard for filtering out irrelevant people, but also probably increases the computation burden when reasoning the collective activities. In this paper, we propose a fast weakly supervised deep learning architecture for collective activity recognition. For fast inference, we propose to make the actor detection and weakly supervised collective activity reasoning collaborate in an end-to-end framework by sharing convolutional layers between them. The joint learning makes the two tasks united and reinforced each other, so that it is more effective to filter out the outliers who are not involved in the activity. For the weakly supervised learning, we propose a latent embedding scheme for mining person-group interactive relationship to get rid of the use of any pairwise relation between people and the individual action labels as well. The experimental results show that the proposed framework achieves comparable or even better performance as compared to the state-of-the-art on three datasets. Our joint modelling reasons collective activities at the speed of 22.65 fps, which is the fastest ever known and substantially makes collective activity recognition more towards real-time applications. |
Databáze: | OpenAIRE |
Externí odkaz: |