Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.

Autor:	Künzle P; Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany. paul.kuenzle@charite.de., Paris S; Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany.
Jazyk:	angličtina
Zdroj:	Clinical oral investigations [Clin Oral Investig] 2024 Oct 07; Vol. 28 (11), pp. 575. Date of Electronic Publication: 2024 Oct 07.
DOI:	10.1007/s00784-024-05968-w
Abstrakt:	Objectives: The advent of artificial intelligence (AI) and large language model (LLM)-based AI applications (LLMAs) has tremendous implications for our society. This study analyzed the performance of LLMAs on solving restorative dentistry and endodontics (RDE) student assessment questions. Materials and Methods: 151 questions from a RDE question pool were prepared for prompting using LLMAs from OpenAI (ChatGPT-3.5,-4.0 and -4.0o) and Google (Gemini 1.0). Multiple-choice questions were sorted into four question subcategories, entered into LLMAs and answers recorded for analysis. P-value and chi-square statistical analyses were performed using Python 3.9.16. Results: The total answer accuracy of ChatGPT-4.0o was the highest, followed by ChatGPT-4.0, Gemini 1.0 and ChatGPT-3.5 (72%, 62%, 44% and 25%, respectively) with significant differences between all LLMAs except GPT-4.0 models. The performance on subcategories direct restorations and caries was the highest, followed by indirect restorations and endodontics. Conclusions: Overall, there are large performance differences among LLMAs. Only the ChatGPT-4 models achieved a success ratio that could be used with caution to support the dental academic curriculum. Clinical Relevance: While LLMAs could support clinicians to answer dental field-related questions, this capacity depends strongly on the employed model. The most performant model ChatGPT-4.0o achieved acceptable accuracy rates in some subject sub-categories analyzed. (© 2024. The Author(s).)
Databáze:	MEDLINE
Externí odkaz:	Zobrazit plný text záznamu Full text from SpringerLink