Measurement precision at the cut score in medical multiple choice exams: Theory matters
Autor: | Stefan K. Schauber, Sören Huwendiek, Sissel Guttormsen, Andrea Carolin Lörwald, Martin R. Fischer, Felicitas-Maria Lahner, Roger Kropf |
---|---|
Rok vydání: | 2020 |
Předmět: |
Models
Educational Psychometrics 610 Medicine & health 01 natural sciences 010305 fluids & plasmas Education Classical test theory 03 medical and health sciences 0103 physical sciences Linear regression Item response theory Statistics Range (statistics) Humans Reliability (statistics) Multiple choice Mathematics business.industry 030503 health policy & services Multiple choice exams Reproducibility of Results Conditional reliability Reliability Measurement precision Logistic Models Test Taking Skills Scale (social sciences) Original Article Clinical Competence Educational Measurement 0305 other medical science business Quality assurance Switzerland |
Zdroj: | Perspectives on Medical Education Lahner, Felicitas-Maria; Schauber, Stefan; Lörwald, Andrea Carolin; Kropf, Roger; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören (2020). Measurement precision at the cut score in medical multiple choice exams: Theory matters. Perspectives on medical education, 9(4), pp. 220-228. Springer 10.1007/s40037-020-00586-0 |
ISSN: | 2212-277X 2212-2761 |
Popis: | Introduction In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. Methods We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. Results In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. Discussion We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |