Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification

Autor:	Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, Thomas B. Moeslund
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Sewer Inspection Vision Transformers Control and Systems Engineering Convolutional Neural Networks Sinkhorn-Knopp Closed-Circuit Television Sewer Defect Classification Building and Construction Civil and Structural Engineering
Zdroj:	Haurum, J B, Madadi, M, Guerrero, S E & Moeslund, T B 2022, ' Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification ', Automation in Construction, vol. 144, 104614 . https://doi.org/10.1016/j.autcon.2022.104614
DOI:	10.1016/j.autcon.2022.104614
Popis:	A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::698a8c11f40098b4308d0cd4b2a98dde https://vbn.aau.dk/ws/files/492073391/1_s2.0_S0926580522004848_main.pdf Zobrazit plný text záznamu