Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models

Autor: Olivier Kashongwe, Tina Kabelitz, Christian Ammon, Lukas Minogue, Markus Doherr, Pablo Silva Boloña, Thomas Amon, Barbara Amon
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: AgriEngineering, Vol 6, Iss 3, Pp 3427-3442 (2024)
Druh dokumentu: article
ISSN: 2624-7402
DOI: 10.3390/agriengineering6030195
Popis: Missing data and class imbalance hinder the accurate prediction of rare events such as dairy mastitis. Resampling and imputation are employed to handle these problems. These methods are often used arbitrarily, despite their profound impact on prediction due to changes caused to the data structure. We hypothesize that their use affects the performance of ML models fitted to automated milking systems (AMSs) data for mastitis prediction. We compare three imputations—simple imputer (SI), multiple imputer (MICE) and linear interpolation (LI)—and three resampling techniques: Synthetic Minority Oversampling Technique (SMOTE), Support Vector Machine SMOTE (SVMSMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEEN). The classifiers were logistic regression (LR), multilayer perceptron (MLP), decision tree (DT) and random forest (RF). We evaluated them with various metrics and compared models with the kappa score. A complete case analysis fitted the RF (0.78) better than other models, for which SI performed best. The DT, RF, and MLP performed better with SVMSMOTE. The RF, DT and MLP had the overall best performance, contributed by imputation or resampling (SMOTE and SVMSMOTE). We recommend carefully selecting resampling and imputation techniques and comparing them with complete cases before deciding on the preprocessing approach used to test AMS data with ML models.
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje