Reliability of Trachoma Clinical Grading—Assessing Grading of Marginal Cases

Autor:	Sintayehu Gebreselassie, Salman Rahman, Abdou Amza, Travis C. Porco, Bruce D. Gaynor, Boubacar Kadri, Joseph P. Sheehan, Thomas M. Lietman, Nassirou Baido, Nicole E. Stoller, Sun N. Yu, Jeremy D. Keenan
Přispěvatelé:	Vinetz, Joseph M
Rok vydání:	2014
Předmět:	Bacterial Diseases Pediatrics medicine.medical_specialty lcsh:Arctic medicine. Tropical medicine lcsh:RC955-962 Concordance Eye Infections education Global Health Medical and Health Sciences Cohen's kappa Clinical Research Tropical Medicine Medicine and Health Sciences Photography medicine Humans Public and Occupational Health Child Preschool Grading (tumors) Randomized Controlled Trials as Topic Trachoma Observer Variation business.industry lcsh:Public aspects of medicine Infant Newborn Public Health Environmental and Occupational Health Reproducibility of Results Infant Neglected Diseases lcsh:RA1-1270 Biological Sciences Tropical Diseases Newborn medicine.disease Infant newborn Ophthalmology Good Health and Well Being Infectious Diseases Child Preschool business Observer variation Conjunctiva Kappa Research Article Neglected Tropical Diseases Clinical psychology
Zdroj:	PLoS Neglected Tropical Diseases, Vol 8, Iss 5, p e2840 (2014) PLoS neglected tropical diseases, vol 8, iss 5 Rahman, SA; Yu, SN; Amza, A; Gebreselassie, S; Kadri, B; Baido, N; et al.(2014). Reliability of Trachoma Clinical Grading-Assessing Grading of Marginal Cases. PLoS Neglected Tropical Diseases, 8(5). doi: 10.1371/journal.pntd.0002840. UC San Francisco: Retrieved from: http://www.escholarship.org/uc/item/2sc8805v PLoS Neglected Tropical Diseases
ISSN:	1935-2735
DOI:	10.1371/journal.pntd.0002840
Popis:	Background Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score. Methods and Findings We trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation - follicular (TF) and trachomatous inflammation - intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases. Conclusions The kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee's grading. Author Summary Trachoma is the leading infectious cause of blindness and the World Health Organization plans to eliminate it as a public health concern worldwide by the year 2020. This effort in large part involves mass oral antibiotic distributions to communities. A simplified trachoma grading scale is used to assess presence of active infection. Field workers must be properly trained and certified to perform these eye exams because their findings inform when to start and stop community-wide antibiotic treatments. Certification involves measuring agreement in trachoma grades between a trainee and an experienced grader on a test-set of trachoma photographs. Often, these test-sets have hard-to-grade cases of trachoma removed. We found that removing these borderline cases inflates agreement. Including these borderline cases in the test-set allows a more realistic estimate of agreement, but it is still difficult to assess a trainee's grades for cases which even experts disagree on. We found that reliability error, a measure derived from the decomposition of the Brier score (the mean squared error of a set of forecasts), can be used to assess a trainee's evaluation of these borderline cases.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b40bf7f4c27c33e79cfb884ad2c1de06 https://doi.org/10.1371/journal.pntd.0002840 Zobrazit plný text záznamu