A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra
Autor: | Mike Coffey, Sinead McParland, Frédéric Dehareng, Pauline Delhez, Nicolas Gengler, Clément Grelet, Marion Calmels, Anthony Tedde, Hélène Soyeurt |
---|---|
Rok vydání: | 2020 |
Předmět: |
Mean squared error
Spectrophotometry Infrared Calibration (statistics) Machine learning computer.software_genre Standard deviation Machine Learning 03 medical and health sciences Partial least squares regression Genetics medicine Animals Lactation Udder Least-Squares Analysis 030304 developmental biology Mathematics 0303 health sciences business.industry 0402 animal and dairy science 04 agricultural and veterinary sciences 040201 dairy & animal science Data set Support vector machine Lactoferrin medicine.anatomical_structure Milk Calibration Animal Science and Zoology Cattle Female Artificial intelligence business Algorithm computer Somatic cell count Algorithms Food Science |
Zdroj: | Journal of dairy science. 103(12) |
ISSN: | 1525-3198 |
Popis: | Lactoferrin (LF) is a glycoprotein naturally present in milk. Its content varies throughout lactation, but also with mastitis; therefore it is a potential additional indicator of udder health beyond somatic cell count. Condequently, there is an interest in quantifying this biomolecule routinely. First prediction equations proposed in the literature to predict the content in milk using milk mid-infrared spectrometry were built using partial least square regression (PLSR) due to the limited size of the data set. Thanks to a large data set, the current study aimed to test 4 different machine learning algorithms using a large data set comprising 6,619 records collected across different herds, breeds, and countries. The first algorithm was a PLSR, as used in past investigations. The second and third algorithms used partial least square (PLS) factors combined with a linear and polynomial support vector regression (PLS + SVR). The fourth algorithm also used PLS factors, but included in an artificial neural network with 1 hidden layer (PLS + ANN). The training and validation sets comprised 5,541 and 836 records, respectively. Even if the calibration prediction performances were the best for PLS + polynomial SVR, their validation prediction performances were the worst. The 3 other algorithms had similar validation performances. Indeed, the validation root mean squared error (RMSE) ranged between 162.17 and 166.75 mg/L of milk. However, the lower standard deviation of cross-validation RMSE and the better normality of the residual distribution observed for PLS + ANN suggest that this modeling was more suitable to predict the LF content in milk from milk mid-infrared spectra (R2v = 0.60 and validation RMSE = 162.17 mg/L of milk). This PLS +ANN model was then applied to almost 6 million spectral records. The predicted LF showed the expected relationships with milk yield, somatic cell score, somatic cell count, and stage of lactation. The model tended to underestimate high LF values (higher than 600 mg/L of milk). However, if the prediction threshold was set to 500 mg/L, 82% of samples from the validation having a content of LF higher than 600 mg/L were detected. Future research should aim to increase the number of those extremely high LF records in the calibration set. |
Databáze: | OpenAIRE |
Externí odkaz: |