Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

Autor: Dine CJ; Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA., Shea JA; Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA., Clancy CB; Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA., Heath JK; Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA., Pluta W; Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA., Kogan JR; Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA. koganj@pennmedicine.upenn.edu.
Jazyk: angličtina
Zdroj: Journal of general internal medicine [J Gen Intern Med] 2024 Aug 21. Date of Electronic Publication: 2024 Aug 21.
DOI: 10.1007/s11606-024-08990-6
Abstrakt: Background: Institutions rely on student evaluations of teaching (SET) to ascertain teaching quality. Manual review of narrative comments can identify faculty with teaching concerns but can be resource and time-intensive.
Aim: To determine if natural language processing (NLP) of SET comments completed by learners on clinical rotations can identify teaching quality concerns.
Setting and Participants: Single institution retrospective cohort analysis of SET (n = 11,850) from clinical rotations between July 1, 2017, and June 30, 2018.
Program Description: The performance of three NLP dictionaries created by the research team was compared to an off-the-shelf Sentiment Dictionary.
Program Evaluation: The Expert Dictionary had an accuracy of 0.90, a precision of 0.62, and a recall of 0.50. The Qualifier Dictionary had lower accuracy (0.65) and precision (0.16) but similar recall (0.67). The Text Mining Dictionary had an accuracy of 0.78 and a recall of 0.24. The Sentiment plus Qualifier Dictionary had good accuracy (0.86) and recall (0.77) with a precision of 0.37.
Discussion: NLP methods can identify teaching quality concerns with good accuracy and reasonable recall, but relatively low precision. An existing, free, NLP sentiment analysis dictionary can perform nearly as well as dictionaries requiring expert coding or manual creation.
(© 2024. The Author(s).)
Databáze: MEDLINE