Abstrakt: |
Purpose To describe an artificial intelligence platform that detects thyroid eye disease (TED). Design Development of a deep learning model. Methods 1944 photographs from a clinical database were used to train a deep learning model. 344 additional images (’test set’) were used to calculate performance metrics. Receiver operating characteristic, precision–recall curves and heatmaps were generated. From the test set, 50 images were randomly selected (’survey set’) and used to compare model performance with ophthalmologist performance. 222 images obtained from a separate clinical database were used to assess model recall and to quantitate model performance with respect to disease stage and grade. Results The model achieved test set accuracy of 89.2%, specificity 86.9%, recall 93.4%, precision 79.7% and an F1 score of 86.0%. Heatmaps demonstrated that the model identified pixels corresponding to clinical features of TED. On the survey set, the ensemble model achieved accuracy, specificity, recall, precision and F1 score of 86%, 84%, 89%, 77% and 82%, respectively. 27 ophthalmologists achieved mean performance of 75%, 82%, 63%, 72% and 66%, respectively. On the second test set, the model achieved recall of 91.9%, with higher recall for moderate to severe (98.2%, n=55) and active disease (98.3%, n=60), as compared with mild (86.8%, n=68) or stable disease (85.7%, n=63). Conclusions The deep learning classifier is a novel approach to identify TED and is a first step in the development of tools to improve diagnostic accuracy and lower barriers to specialist evaluation. [ABSTRACT FROM AUTHOR] |