Autor:	Sauvalle, Bruno, de La Fortelle, Arnaud
Rok vydání:	2022
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present in the scene and to associate a feature vector to each object. A transformer encoder handles occlusions and redundant detections, and a convolutional autoencoder is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2205.13271 Zobrazit plný text záznamu View this record from Arxiv