Autor: |
Christos Sevastopoulos, Mohammad Zaki Zadeh, Michail Theofanidis, Sneh Acharya, Nishi Patel, Fillia Makedon |
Jazyk: |
angličtina |
Rok vydání: |
2023 |
Předmět: |
|
Zdroj: |
Technologies, Vol 11, Iss 3, p 64 (2023) |
Druh dokumentu: |
article |
ISSN: |
2227-7080 |
DOI: |
10.3390/technologies11030064 |
Popis: |
This article presents a method for extracting high-level semantic information through successful landmark detection using 2D RGB images. In particular, the focus is placed on the presence of particular labels (open path, humans, staircase, doorways, obstacles) in the encountered scene, which can be a fundamental source of information enhancing scene understanding and paving the path towards the safe navigation of the mobile unit. Experiments are conducted using a manual wheelchair to gather image instances from four indoor academic environments consisting of multiple labels. Afterwards, the fine-tuning of a pretrained vision transformer (ViT) is conducted, and the performance is evaluated through an ablation study versus well-established state-of-the-art deep architectures for image classification such as ResNet. Results show that the fine-tuned ViT outperforms all other deep convolutional architectures while achieving satisfactory levels of generalization. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|