Evaluation of ChatGPT and Gemini large language models for pharmacometrics with NONMEM.

Autor: Shin E; Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA., Yu Y; Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA., Bies RR; Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA., Ramanathan M; Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA. Murali@Buffalo.Edu.
Jazyk: angličtina
Zdroj: Journal of pharmacokinetics and pharmacodynamics [J Pharmacokinet Pharmacodyn] 2024 Jun; Vol. 51 (3), pp. 187-197. Date of Electronic Publication: 2024 Apr 24.
DOI: 10.1007/s10928-024-09921-y
Abstrakt: To assess ChatGPT 4.0 (ChatGPT) and Gemini Ultra 1.0 (Gemini) large language models on NONMEM coding tasks relevant to pharmacometrics and clinical pharmacology. ChatGPT and Gemini were assessed on tasks mimicking real-world applications of NONMEM. The tasks ranged from providing a curriculum for learning NONMEM, an overview of NONMEM code structure to generating code. Prompts in lay language to elicit NONMEM code for a linear pharmacokinetic (PK) model with oral administration and a more complex model with two parallel first-order absorption mechanisms were investigated. Reproducibility and the impact of "temperature" hyperparameter settings were assessed. The code was reviewed by two NONMEM experts. ChatGPT and Gemini provided NONMEM curriculum structures combining foundational knowledge with advanced concepts (e.g., covariate modeling and Bayesian approaches) and practical skills including NONMEM code structure and syntax. ChatGPT provided an informative summary of the NONMEM control stream structure and outlined the key NONMEM Translator (NM-TRAN) records needed. ChatGPT and Gemini were able to generate code blocks for the NONMEM control stream from the lay language prompts for the two coding tasks. The control streams contained focal structural and syntax errors that required revision before they could be executed without errors and warnings. The code output from ChatGPT and Gemini was not reproducible, and varying the temperature hyperparameter did not reduce the errors and omissions substantively. Large language models may be useful in pharmacometrics for efficiently generating an initial coding template for modeling projects. However, the output can contain errors and omissions that require correction.
(© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.)
Databáze: MEDLINE