Impact of different study populations on reader behavior and performance metrics: initial results
Autor: | Elodia B. Cole, Kyle J. Myers, Etta D. Pisano, Brandon D. Gallas |
---|---|
Rok vydání: | 2017 |
Předmět: |
medicine.medical_specialty
education.field_of_study Digital mammography Data collection Receiver operating characteristic medicine.diagnostic_test Recall business.industry Population Workload Article 030218 nuclear medicine & medical imaging 03 medical and health sciences 0302 clinical medicine 030220 oncology & carcinogenesis medicine Mammography Medical physics Decision threshold business education Simulation |
Zdroj: | Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment. |
ISSN: | 0277-786X |
DOI: | 10.1117/12.2255977 |
Popis: | The FDA recently completed a study on design methodologies surrounding the Validation of Imaging Premarket Evaluation and Regulation called VIPER. VIPER consisted of five large reader sub-studies to compare the impact of different study populations on reader behavior as seen by sensitivity, specificity, and AUC, the area under the ROC curve (receiver operating characteristic curve). The study investigated different prevalence levels and two kinds of sampling of non-cancer patients: a screening population and a challenge population. The VIPER study compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with heterogeneously dense or extremely dense breasts. All cases and corresponding images were sampled from Digital Mammographic Imaging Screening Trial (DMIST) archives. There were 20 readers (American Board Certified radiologists) for each sub-study, and instead of every reader reading every case (fully-crossed study), readers and cases were split into groups to reduce reader workload and the total number of observations (split-plot study). For data collection, readers first decided whether or not they would recall a patient. Following that decision, they provided an ROC score for how close or far that patient was from the recall decision threshold. Performance results for FFDM show that as prevalence increases to 50%, there is a moderate increase in sensitivity and decrease in specificity, whereas AUC is mainly flat. Regarding precision, the statistical efficiency (ratio of variances) of sensitivity and specificity relative to AUC are 0.66 at best and decrease with prevalence. Analyses comparing modalities and the study populations (screening vs. challenge) are still ongoing. |
Databáze: | OpenAIRE |
Externí odkaz: |