Data mining sugarcane breeding yield data for ratoon yield prediction

Autor: E. O. Dufrene, James Todd, C. A. Kimbeng, M. J. Pontif, Herman Waguespack, Debbie Boykin
Rok vydání: 2021
Předmět:
Zdroj: Euphytica. 217
ISSN: 1573-5060
0014-2336
Popis: Ratooning ability is important for sugarcane (Saccharum spp.), but due to time constraints relevant third ratoon yield data is not gathered before breeding selections are made. If third ratoon could be predicted, selections would improve. Machine learning (ML) techniques utilize yield variables to create predictors. Yield variables from 11 locations, 24 genotypes, and 20 cycles were used in a model to predict third ratoon-cane yield cross validated by cycle using ML and statistical techniques including Linear Regression, Random Forest, AdaBoost, Stochastic Gradient, Neural Network, Support Vector Machines, and k-nearest neighbors algorithm. Prediction error was measured as the difference between predicted and measured third ratoon cane yield. A model that partitioned overall prediction error into sources of variances indicated that location within cycle (52%) followed by genotype by location within cycle (10%) were the largest sources of error for predicting ratooning ability. Ratoon stalk number, sucrose and cane yield ranked highly toward predictions. The Adaboost (AB) ML predictors of third ratoon cane yield had lower experimental error, when compared to second ratoon as a third ratoon predictor. However, prediction by cycle was not consistently lower than second ratoon as a predictor. If unique locations were removed, then AB predictions had less error and better predictors of third ratoon than second ratoon in 75% of the cycles tested, thus demonstrating the potential and limitations of utilizing harvest and location data for ML predictions.
Databáze: OpenAIRE