Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning

Autor: Xuewei Ding, Yingwei Pan, Yehao Li, Ting Yao, Dan Zeng, Tao Mei
Rok vydání: 2023
Předmět:
Zdroj: ACM Transactions on Multimedia Computing, Communications, and Applications. 19:1-18
ISSN: 1551-6865
1551-6857
Popis: Visual and spatial relationship detection in images has been a fast-developing research topic in the multimedia field, which learns to recognize the semantic/spatial interactions between objects in an image, aiming to compose a structured semantic understanding of the scene. Most of the existing techniques directly encapsulate the holistic image feature plus the semantic and spatial features of the given two objects for predicting the relationship, but leave the inherent supervision derived from such structured and thorough image understanding under-exploited. Specifically, the inherent supervision among objects or relations within an image can span different granularities in this hierarchy including, from simple to comprehensive, (1) the object-based supervision that captures the interaction between the semantic and spatial features of each individual object, (2) the inter-object supervision that characterizes the dependency within the relationship triplet ( ), and (3) the inter-relation supervision that exploits contextual information among all relationship triplets in an image. These inherent multi-granular supervisions offer a fertile ground for building self-supervised proxy tasks. In this article, we compose a trilogy of exploring the multi-granular supervision in the sequence from object-based, inter-object, and inter-relation perspectives. We integrate the standard relationship detection objective with a series of proposed self-supervised proxy tasks, which is named as Multi-Granular Self-Supervised learning (MGS). Our MGS is appealing in view that it is pluggable to any neural relationship detection models by simply including the proxy tasks during training, without increasing the computational cost at inference. Through extensive experiments conducted on the SpatialSense and VRD datasets, we demonstrate the superiority of MGS for both spatial and visual relationship detection tasks.
Databáze: OpenAIRE