Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.
Autor: | Napravnik M; Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka, 51000, Croatia., Hržić F; Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka, 51000, Croatia.; Center for Artificial Intelligence and Cybersecurity, Radmile Matejcic 2, Rijeka, 51000, Croatia., Tschauner S; Division of Pediatric Radiology, Department of Radiology, Medical University of Graz, Neue Stiftingtalstraße 6, Graz, 8010, Austria., Štajduhar I; Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka, 51000, Croatia. ivan.stajduhar@uniri.hr.; Center for Artificial Intelligence and Cybersecurity, Radmile Matejcic 2, Rijeka, 51000, Croatia. ivan.stajduhar@uniri.hr. |
---|---|
Jazyk: | angličtina |
Zdroj: | BioData mining [BioData Min] 2024 Jul 12; Vol. 17 (1), pp. 22. Date of Electronic Publication: 2024 Jul 12. |
DOI: | 10.1186/s13040-024-00373-1 |
Abstrakt: | Background: The use of machine learning in medical diagnosis and treatment has grown significantly in recent years with the development of computer-aided diagnosis systems, often based on annotated medical radiology images. However, the lack of large annotated image datasets remains a major obstacle, as the annotation process is time-consuming and costly. This study aims to overcome this challenge by proposing an automated method for annotating a large database of medical radiology images based on their semantic similarity. Results: An automated, unsupervised approach is used to create a large annotated dataset of medical radiology images originating from the Clinical Hospital Centre Rijeka, Croatia. The pipeline is built by data-mining three different types of medical data: images, DICOM metadata and narrative diagnoses. The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation. Conclusions: The results indicate that fusing the embeddings of all three data sources together provides the best results for the task of unsupervised clustering of large-scale medical data and leads to the most concise clusters. Hence, this work marks the initial step towards building a much larger and more fine-grained annotated dataset of medical radiology images. (© 2024. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |