MetBERT: a generalizable and pre-trained deep learning model for the prediction of metastatic cancer from clinical notes

Autor: Liu, Ke, Kulkarni, Omkar, Witteveen-Lane, Martin, Chen, Bin, Chesla, Dave
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: AMIA Annu Symp Proc
Popis: Distant metastasis is the major cause of cancer-related deaths; however, early diagnosis of cancer metastasis remains a significant challenge. The recent advances in pre-trained natural language processing models coupled with the accumulation of publicly available Electronic Health Records (EHR) data provide an unprecedented opportunity to computationally tackle the challenge. Here, we fine-tuned multiple state-of-the-art BERT-based models using discharge summaries from the open MIMIC-III dataset and derived MetBERT, a novel model tailored to predict cancer metastasis from clinical notes. MetBERT achieved high performance (AUC=0.94) on our in-house validation dataset, suggesting its high generalizability. In addition, MetBERT enabled determining the date of cancer metastasis using the rich information in clinical notes and therefore could be potentially deployed as a tool for early diagnosis. Finally, we interpreted MetBERT at different scales and revealed a possible association between radiation therapy and metastasis risk in multiple cancer types.
Databáze: OpenAIRE