Medical visual question answering via corresponding feature fusion combined with semantic attention

Autor: Han Zhu, Xiaohai He, Meiling Wang, Mozhi Zhang, Linbo Qing
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Mathematical Biosciences and Engineering, Vol 19, Iss 10, Pp 10192-10212 (2022)
Druh dokumentu: article
ISSN: 1551-0018
DOI: 10.3934/mbe.2022478?viewType=HTML
Popis: Medical visual question answering (Med-VQA) aims to leverage a pre-trained artificial intelligence model to answer clinical questions raised by doctors or patients regarding radiology images. However, owing to the high professional requirements in the medical field and the difficulty of annotating medical data, Med-VQA lacks sufficient large-scale, well-annotated radiology images for training. Researchers have mainly focused on improving the ability of the model's visual feature extractor to address this problem. However, there are few researches focused on the textual feature extraction, and most of them underestimated the interactions between corresponding visual and textual features. In this study, we propose a corresponding feature fusion (CFF) method to strengthen the interactions of specific features from corresponding radiology images and questions. In addition, we designed a semantic attention (SA) module for textual feature extraction. This helps the model consciously focus on the meaningful words in various questions while reducing the attention spent on insignificant information. Extensive experiments demonstrate that the proposed method can achieve competitive results in two benchmark datasets and outperform existing state-of-the-art methods on answer prediction accuracy. Experimental results also prove that our model is capable of semantic understanding during answer prediction, which has certain advantages in Med-VQA.
Databáze: Directory of Open Access Journals