CancerLLM: A Large Language Model in Cancer Domain

Autor: Li, Mingchen, Huang, Jiatan, Yeung, Jeremy, Blaes, Anne, Johnson, Steven, Liu, Hongfang, Xu, Hua, Zhang, Rui
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Medical Large Language Models (LLMs) such as ClinicalCamel 70B, Llama3-OpenBioLLM 70B have demonstrated impressive performance on a wide variety of medical NLP task.However, there still lacks a large language model (LLM) specifically designed for cancer domain. Moreover, these LLMs typically have billions of parameters, making them computationally expensive for healthcare systems.Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on 2,676,642 clinical notes and 515,524 pathology reports covering 17 cancer types, followed by fine-tuning on three cancer-relevant tasks, including cancer phenotypes extraction, and cancer diagnosis generation. Our evaluation demonstrated that CancerLLM achieves state-of-the-art results compared to other existing LLMs, with an average F1 score improvement of 7.61 %. Additionally, CancerLLM outperforms other models on two proposed robustness testbeds. This illustrates that CancerLLM can be effectively applied to clinical AI systems, enhancing clinical research and healthcare delivery in the field of cancer.
Comment: add the diagnosis evaluation of ICD code
Databáze: arXiv