Learning interactive multi‐object segmentation through appearance embedding and spatial attention

Autor:	Yan Gui, Bingqiang Zhou, Jianming Zhang, Cheng Sun, Lingyun Xiang, Jin Zhang
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Photography TR1-1050 Computer software QA76.75-76.765
Zdroj:	IET Image Processing, Vol 16, Iss 10, Pp 2722-2737 (2022)
Druh dokumentu:	article
ISSN:	1751-9667 1751-9659
DOI:	10.1049/ipr2.12520
Popis:	Abstract Deep learning approaches to interactive image segmentation are typically formulated as a binary labeling problem. A model trained to make predictions within a fixed set of labels (i.e., foreground and background labels) cannot be used to directly predict the binary masks of multiple objects of interest, which greatly limits its flexibility and adaptivity. The use of different classes of clicks as input is opted for and the first end‐to‐end learning model for multi‐object segmentation, based on a new designed neural network, is developed. The network consists of a visual feature extractor, a recurrent attention module and a dynamic segmentation head, extracts user click‐adapted appearance embedding features and spatial attention features, and then learns to transform this information into a segmentation of multiple objects. It is also proposed to train the network using a joint loss function, taking the embedding learning into account for segmentation. Comprehensive experiments are conducted on three benchmark datasets to demonstrate the effectiveness of the proposed method. It performs favorably against state‐of‐the‐art approaches on the multiple object segmentation task, for example, with 0.15 s per image, 0.06 s per object and mean IoU & F1 score of 84.90% on Pascal VOC 2012 validation set. It is further shown that the method can be used in numerous vision applications such as image recoloring and colorization.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/ac3afddc2fce402d8368c67a8ac02685 Zobrazit plný text záznamu Plný text View record in DOAJ