Autor: |
João Daniel Silva, Bruno Martins, João Magalhães |
Jazyk: |
angličtina |
Rok vydání: |
2023 |
Předmět: |
|
Zdroj: |
Intelligent Systems with Applications, Vol 18, Iss , Pp 200221- (2023) |
Druh dokumentu: |
article |
ISSN: |
2667-3053 |
DOI: |
10.1016/j.iswa.2023.200221 |
Popis: |
Models for Visual Question Answering (VQA) on medical images aim to answer diagnostically relevant natural language questions with basis on visual contents. In this article, we propose a novel approach to address this problem, which combines a strong image encoder based on EfficientNetV2 with a multimodal encoder based on the RealFormer architecture. Our model is pre-trained through a strategy that includes a contrastive objective, and the final fine-tuning to the VQA task uses a loss function that specifically addresses class imbalance. The experimental results confirm the effectiveness of our approach on the VQA-Med dataset from ImageCLEF 2019, showcasing the potential benefits of combining multimodal pre-training with recent advances in terms of neural network architectures. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|