Beyond images: an integrative multi-modal approach to chest x-ray report generation

Autor:	Nurbanu Aksoy, Serge Sharoff, Selcuk Baser, Nishant Ravikumar, Alejandro F. Frangi
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	report generation transformers cross attention multi-modal data x-ray deep learning Medical physics. Medical radiology. Nuclear medicine R895-920
Zdroj:	Frontiers in Radiology, Vol 4 (2024)
Druh dokumentu:	article
ISSN:	2673-8740
DOI:	10.3389/fradi.2024.1339612
Popis:	Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest x-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes. We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model’s accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/47a3ca7ed6e047a98c45305b06821870 Zobrazit plný text záznamu View record in DOAJ