[Automatic ICD-10 coding : Natural language processing for German MRI reports].

Autor: Mittermeier A; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland. Andreas.Mittermeier@med.uni-muenchen.de.; Munich Center for Machine Learning (MCML), München, Deutschland. Andreas.Mittermeier@med.uni-muenchen.de., Aßenmacher M; Institut für Statistik, LMU München, München, Deutschland., Schachtner B; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland.; Munich Center for Machine Learning (MCML), München, Deutschland., Grosu S; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland., Dakovic V; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland., Kandratovich V; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland., Sabel B; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland., Ingrisch M; Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland.; Munich Center for Machine Learning (MCML), München, Deutschland.
Jazyk: němčina
Zdroj: Radiologie (Heidelberg, Germany) [Radiologie (Heidelb)] 2024 Oct; Vol. 64 (10), pp. 793-800. Date of Electronic Publication: 2024 Aug 09.
DOI: 10.1007/s00117-024-01349-2
Abstrakt: Background: The medical coding of radiology reports is essential for a good quality of care and correct billing, but at the same time a complex and error-prone task.
Objective: To assess the performance of natural language processing (NLP) for ICD-10 coding of German radiology reports using fine tuning of suitable language models.
Material and Methods: This retrospective study included all magnetic resonance imaging (MRI) radiology reports acquired at our institution between 2010 and 2020. The codes on discharge ICD-10 were matched to the corresponding reports to construct a dataset for multiclass classification. Fine tuning of GermanBERT and flanT5 was carried out on the total dataset (ds total ) containing 1035 different ICD-10 codes and 2 reduced subsets containing the 100 (ds 100 ) and 50 (ds 50 ) most frequent codes. The performance of the model was assessed using top‑k accuracy for k = 1, 3 and 5. In an ablation study both models were trained on the accompanying metadata and the radiology report alone.
Results: The total dataset consisted of 100,672 radiology reports, the reduced subsets ds 100 of 68,103 and ds 50 of 52,293 reports. The performance of the model increased when several of the best predictions of the model were taken into consideration, when the number of target classes was reduced and the metadata were combined with the report. The flanT5 outperformed GermanBERT across all datasets and metrics and was is suited as a medical coding assistant, achieving a top 3 accuracy of nearly 70% in the real-world dataset ds total .
Conclusion: Finely tuned language models can reliably predict ICD-10 codes of German magnetic resonance imaging (MRI) radiology reports across various settings. As a coding assistant flanT5 can guide medical coders to make informed decisions and potentially reduce the workload.
(© 2024. The Author(s), under exclusive licence to Springer Medizin Verlag GmbH, ein Teil von Springer Nature.)
Databáze: MEDLINE