Data mining sugarcane breeding yield data for ratoon yield prediction
Autor: | E. O. Dufrene, James Todd, C. A. Kimbeng, M. J. Pontif, Herman Waguespack, Debbie Boykin |
---|---|
Rok vydání: | 2021 |
Předmět: |
0106 biological sciences
0301 basic medicine biology Yield (finance) Plant Science Horticulture biology.organism_classification 01 natural sciences Ratooning Random forest Support vector machine Saccharum 03 medical and health sciences 030104 developmental biology Statistics Linear regression Genetics AdaBoost Cane Agronomy and Crop Science 010606 plant biology & botany |
Zdroj: | Euphytica. 217 |
ISSN: | 1573-5060 0014-2336 |
Popis: | Ratooning ability is important for sugarcane (Saccharum spp.), but due to time constraints relevant third ratoon yield data is not gathered before breeding selections are made. If third ratoon could be predicted, selections would improve. Machine learning (ML) techniques utilize yield variables to create predictors. Yield variables from 11 locations, 24 genotypes, and 20 cycles were used in a model to predict third ratoon-cane yield cross validated by cycle using ML and statistical techniques including Linear Regression, Random Forest, AdaBoost, Stochastic Gradient, Neural Network, Support Vector Machines, and k-nearest neighbors algorithm. Prediction error was measured as the difference between predicted and measured third ratoon cane yield. A model that partitioned overall prediction error into sources of variances indicated that location within cycle (52%) followed by genotype by location within cycle (10%) were the largest sources of error for predicting ratooning ability. Ratoon stalk number, sucrose and cane yield ranked highly toward predictions. The Adaboost (AB) ML predictors of third ratoon cane yield had lower experimental error, when compared to second ratoon as a third ratoon predictor. However, prediction by cycle was not consistently lower than second ratoon as a predictor. If unique locations were removed, then AB predictions had less error and better predictors of third ratoon than second ratoon in 75% of the cycles tested, thus demonstrating the potential and limitations of utilizing harvest and location data for ML predictions. |
Databáze: | OpenAIRE |
Externí odkaz: |