Me LLaMA: Foundation Large Language Models for Medical Applications
Autor: | Xie, Qianqian, Chen, Qingyu, Chen, Aokun, Peng, Cheng, Hu, Yan, Lin, Fongci, Peng, Xueqing, Huang, Jimin, Zhang, Jeffrey, Keloth, Vipina, Zhou, Xinyu, Qian, Lingfei, He, Huan, Shung, Dennis, Ohno-Machado, Lucila, Wu, Yonghui, Xu, Hua, Bian, Jiang |
---|---|
Rok vydání: | 2024 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Recent advancements in large language models (LLMs) like ChatGPT and LLaMA show promise in medical applications, yet challenges remain in medical language comprehension. This study presents Me-LLaMA, a new medical LLM family based on open-source LLaMA models, optimized for medical text analysis and diagnosis by leveraging large-scale, domain-specific datasets. The Me-LLaMA family, including foundation models Me-LLaMA 13/70B and their chat-enhanced versions, was developed through continued pre-training and instruction tuning with 129B tokens and 214K samples from biomedical and clinical sources. Training the 70B models required over 100,000 A100 GPU hours. Me-LLaMA's performance was evaluated across six medical text analysis tasks using 12 benchmark datasets and complex clinical case diagnosis, with automatic and human evaluations. Results indicate Me-LLaMA outperforms LLaMA and other open-source medical LLMs in zero-shot and supervised settings. Task-specific tuning further boosts performance, surpassing ChatGPT on 7 of 8 datasets and GPT-4 on 5 of 8. For complex clinical cases, Me-LLaMA achieves performance comparable to ChatGPT and GPT-4. This work underscores the importance of domain-specific data in developing medical LLMs and addresses the high computational costs involved in training, highlighting a balance between pre-training and fine-tuning strategies. Me-LLaMA models are now accessible under user agreements, providing a valuable resource for advancing medical AI. Comment: 21 pages, 4 figures, 8 tables |
Databáze: | arXiv |
Externí odkaz: |