Practical challenges and potential approaches to predicting low-incidence diseases on farm using individual cow data: A clinical mastitis example

Autor: D.M. Liebe, N.M. Steele, C.S. Petersson-Wolfe, A. De Vries, R.R. White
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Journal of Dairy Science, Vol 105, Iss 3, Pp 2369-2379 (2022)
Druh dokumentu: article
ISSN: 0022-0302
DOI: 10.3168/jds.2021-20306
Popis: ABSTRACT: Clinical mastitis (CM) incidence is considerable in terms of cows affected per year, but cases are much less common in terms of detections per cow per milking. From a modeling perspective, where predictions are made every time any cow is milked, low CM incidence per cow day makes training, evaluating, and applying CM prediction models a challenge. The objective of this study was to build models for predicting CM incidence using time-series sensor data and choose models that maximize net return based on a cost matrix. Data collected from 2 university dairy farms, the University of Florida and Virginia Polytechnic Institute and State University, were used to gather representative data, including 110,156 milkings and 333 CM cases. Variables used in the models were milk yield, protein, lactose, fat, electrical conductivity, days in milk, lactation number, and activity as the number of steps, lying time, lying bouts, and lying bout duration. Models that predicted either likelihood of CM caused by gram-negative (GN) or gram-positive (GP) bacteria on each day were derived using extreme gradient boosting with weighting favoring true-positive cases, logistic responses, and log-loss errors. Model accuracies were determined using data randomly held out from the training set on each run. All variables considered were in terms of change (slope) over previous days, including the day CM was visually detected. The GN models had a median sensitivity (Se) of 52.6% and specificity (Sp) of 99.8%, whereas the GP models had a median Se of 37.5% and Sp of 99.9% when tested on the held-out data. In our models optimized to reduce cost from predictions, the Se was much less than Sp, suggesting that CM models might benefit from greater model weighting placed on Sp. Results also highlight the importance of positive predictive value (true positive cases per predicted positive case) along with Sp and Se, as models built on sparse data tend to predict too many false-positive cases. The calculated partial net return of our GN and GP models were −$0.15 and −$0.10 per cow per lactation, respectively, whereas International Organization for Standardization (ISO) standard models with Se of 80% and Sp of 99% would return −$1.32 per cow per lactation. Models chosen that minimized the cost to the farmer differed markedly from models that met ISO guidelines, showing asymmetry in targets between Sp and Se when the disease incidence rate is low. Because of the unique challenges that low-incidence diseases like CM present, we recommend that future CM predictive models consider the economic and practical implications in addition to the traditional model evaluation metrics.
Databáze: Directory of Open Access Journals