Assessing different cross-validation schemes for predicting novel traits using sensor data: An application to dry matter intake and residual feed intake using milk spectral data

Autor: A. Yilmaz Adkinson, M. Abouhawwash, M.J. VandeHaar, K.L. Parker Gaddis, J. Burchard, F. Peñagaricano, H.M. White, K.A. Weigel, R. Baldwin, J.E.P. Santos, J.E. Koltes, R.J. Tempelman
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Journal of Dairy Science, Vol 107, Iss 10, Pp 8084-8099 (2024)
Druh dokumentu: article
ISSN: 0022-0302
DOI: 10.3168/jds.2024-24701
Popis: ABSTRACT: Feed efficiency is important for economic profitability of dairy farms; however, recording daily DMI is expensive. Our objective was to investigate the potential use of milk mid-infrared (MIR) spectral data to predict proxy phenotypes for DMI based on different cross-validation schemes. We were specifically interested in comparisons between a model that included only MIR data (model M1); a model that incorporated different energy sink predictors, such as body weight, body weight change, and milk energy (model M2); and an extended model that incorporated both energy sinks and MIR data (model M3). Models M2 and M3 also included various cow-level variables (stage of lactation, age at calving, parity) such that any improvement in model performance from M2 to M3, whether through a smaller root mean squared error (RMSE) or a greater squared predictive correlation (R2), could indicate a potential benefit of MIR to predict residual feed intake. The data used in our study originated from a multi-institutional project on the genetics of feed efficiency in US Holsteins. Analyses were conducted on 2 different trait definitions based on different period lengths: averaged across weeks versus averaged across 28 d. Specifically, there were 19,942 weekly records on 1,812 cows across 46 experiments or cohorts and 3,724 28-d records on 1,700 cows across 43 different cohorts. The cross-validation analyses involved 3 different k-fold schemes. First, a 10-fold cow-independent cross-validation was conducted whereby all records from any one cow were kept together in either training or test sets. Similarly, a 10-fold experiment-independent cross-validation kept entire experiments together, whereas a 4-fold herd-independent cross-validation kept entire herds together in either training or test sets. Based on cow-independent cross-validation for both weekly and 28-d DMI, adding MIR predictors to energy sinks (model M3 vs. M2) significantly (P < 10−10) reduced average RMSE to 1.59 kg and increased average R2 to 0.89. However, adding MIR to energy sinks (M3) to predict DMI either within an experiment-independent or herd-independent cross-validation scheme seemed to demonstrate no merit (P > 0.05) compared with an energy sink model (M2) for either R2 or RMSE (respectively, 0.68 and 2.55 kg for M2 in herd-independent scheme). We further noted that with broader cross-validation schemes (i.e., from cow-independent to experiment-independent to herd-independent schemes), the mean and slope bias increased. Given that proxy DMI phenotypes for cows would need to be almost entirely generated in herds having no DMI or training data of their own, herd-independent cross-validation assessments of predictive performance should be emphasized. Hence, more research on predictive algorithms suitable for broader cross-validation schemes and a more earnest effort on calibration of spectrophotometers against each other should be considered.
Databáze: Directory of Open Access Journals