Reliability of Trachoma Clinical Grading—Assessing Grading of Marginal Cases

Autor: Sintayehu Gebreselassie, Salman Rahman, Abdou Amza, Travis C. Porco, Bruce D. Gaynor, Boubacar Kadri, Joseph P. Sheehan, Thomas M. Lietman, Nassirou Baido, Nicole E. Stoller, Sun N. Yu, Jeremy D. Keenan
Přispěvatelé: Vinetz, Joseph M
Rok vydání: 2014
Předmět:
Bacterial Diseases
Pediatrics
medicine.medical_specialty
lcsh:Arctic medicine. Tropical medicine
lcsh:RC955-962
Concordance
Eye Infections
education
Global Health
Medical and Health Sciences
Cohen's kappa
Clinical Research
Tropical Medicine
Medicine and Health Sciences
Photography
medicine
Humans
Public and Occupational Health
Child
Preschool
Grading (tumors)
Randomized Controlled Trials as Topic
Trachoma
Observer Variation
business.industry
lcsh:Public aspects of medicine
Infant
Newborn

Public Health
Environmental and Occupational Health

Reproducibility of Results
Infant
Neglected Diseases
lcsh:RA1-1270
Biological Sciences
Tropical Diseases
Newborn
medicine.disease
Infant newborn
Ophthalmology
Good Health and Well Being
Infectious Diseases
Child
Preschool

business
Observer variation
Conjunctiva
Kappa
Research Article
Neglected Tropical Diseases
Clinical psychology
Zdroj: PLoS Neglected Tropical Diseases, Vol 8, Iss 5, p e2840 (2014)
PLoS neglected tropical diseases, vol 8, iss 5
Rahman, SA; Yu, SN; Amza, A; Gebreselassie, S; Kadri, B; Baido, N; et al.(2014). Reliability of Trachoma Clinical Grading-Assessing Grading of Marginal Cases. PLoS Neglected Tropical Diseases, 8(5). doi: 10.1371/journal.pntd.0002840. UC San Francisco: Retrieved from: http://www.escholarship.org/uc/item/2sc8805v
PLoS Neglected Tropical Diseases
ISSN: 1935-2735
DOI: 10.1371/journal.pntd.0002840
Popis: Background Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score. Methods and Findings We trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation - follicular (TF) and trachomatous inflammation - intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases. Conclusions The kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee's grading.
Author Summary Trachoma is the leading infectious cause of blindness and the World Health Organization plans to eliminate it as a public health concern worldwide by the year 2020. This effort in large part involves mass oral antibiotic distributions to communities. A simplified trachoma grading scale is used to assess presence of active infection. Field workers must be properly trained and certified to perform these eye exams because their findings inform when to start and stop community-wide antibiotic treatments. Certification involves measuring agreement in trachoma grades between a trainee and an experienced grader on a test-set of trachoma photographs. Often, these test-sets have hard-to-grade cases of trachoma removed. We found that removing these borderline cases inflates agreement. Including these borderline cases in the test-set allows a more realistic estimate of agreement, but it is still difficult to assess a trainee's grades for cases which even experts disagree on. We found that reliability error, a measure derived from the decomposition of the Brier score (the mean squared error of a set of forecasts), can be used to assess a trainee's evaluation of these borderline cases.
Databáze: OpenAIRE