Learning interactive multi‐object segmentation through appearance embedding and spatial attention
Autor: | Yan Gui, Bingqiang Zhou, Jianming Zhang, Cheng Sun, Lingyun Xiang, Jin Zhang |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | IET Image Processing, Vol 16, Iss 10, Pp 2722-2737 (2022) |
Druh dokumentu: | article |
ISSN: | 1751-9667 1751-9659 |
DOI: | 10.1049/ipr2.12520 |
Popis: | Abstract Deep learning approaches to interactive image segmentation are typically formulated as a binary labeling problem. A model trained to make predictions within a fixed set of labels (i.e., foreground and background labels) cannot be used to directly predict the binary masks of multiple objects of interest, which greatly limits its flexibility and adaptivity. The use of different classes of clicks as input is opted for and the first end‐to‐end learning model for multi‐object segmentation, based on a new designed neural network, is developed. The network consists of a visual feature extractor, a recurrent attention module and a dynamic segmentation head, extracts user click‐adapted appearance embedding features and spatial attention features, and then learns to transform this information into a segmentation of multiple objects. It is also proposed to train the network using a joint loss function, taking the embedding learning into account for segmentation. Comprehensive experiments are conducted on three benchmark datasets to demonstrate the effectiveness of the proposed method. It performs favorably against state‐of‐the‐art approaches on the multiple object segmentation task, for example, with 0.15 s per image, 0.06 s per object and mean IoU & F1 score of 84.90% on Pascal VOC 2012 validation set. It is further shown that the method can be used in numerous vision applications such as image recoloring and colorization. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |