A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets.

Autor: Montesinos-López OA; Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico., Pulido-Carrillo CD; Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico., Montesinos-López A; Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico., Larios Trejo JA; Facultad de Ciencias de la Educación, Universidad de Colima, Colima 28040, Mexico., Montesinos-López JC; Department of Public Health Sciences, University of California Davis, Davis, CA 95616, USA., Agbona A; International Institute of Tropical Agriculture (IITA), Ibadan 200113, Nigeria.; Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX 77843, USA., Crossa J; International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco 52640, Mexico.; Louisiana State University, Baton Rouge, LA 70803, USA.; Distinguished Scientist Fellowship Program and Department of Statistics and Operations Research, King Saud University, Riyah 11451, Saudi Arabia.; Colegio de Postgraduados, Montecillos 56230, Mexico.
Jazyk: angličtina
Zdroj: Genes [Genes (Basel)] 2024 Jul 23; Vol. 15 (8). Date of Electronic Publication: 2024 Jul 23.
DOI: 10.3390/genes15080969
Abstrakt: Genomic selection (GS) is changing plant breeding by significantly reducing the resources needed for phenotyping. However, its accuracy can be compromised by mismatches between training and testing sets, which impact efficiency when the predictive model does not adequately reflect the genetic and environmental conditions of the target population. To address this challenge, this study introduces a straightforward method using binary-Lasso regression to estimate β coefficients. In this approach, the response variable assigns 1 to testing set inputs and 0 to training set inputs. Subsequently, Lasso, Ridge, and Elastic Net regression models use the inverse of these β coefficients (in absolute values) as weights during training (WLasso, WRidge, and WElastic Net). This weighting method gives less importance to features that discriminate more between training and testing sets. The effectiveness of this method is evaluated across six datasets, demonstrating consistent improvements in terms of the normalized root mean square error. Importantly, the model's implementation is facilitated using the glmnet library, which supports straightforward integration for weighting β coefficients.
Databáze: MEDLINE