Bias and Variance Analysis of Contemporary Symbolic Regression Methods

Autor: Lukas Kammerer, Gabriel Kronberger, Stephan Winkler
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Applied Sciences, Vol 14, Iss 23, p 11061 (2024)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app142311061
Popis: Symbolic regression is commonly used in domains where both high accuracy and interpretability of models is required. While symbolic regression is capable to produce highly accurate models, small changes in the training data might cause highly dissimilar solution. The implications in practice are huge, as interpretability as key-selling feature degrades when minor changes in data cause substantially different behavior of models. We analyse those perturbations caused by changes in training data for ten contemporary symbolic regression algorithms. We analyse existing machine learning models from the SRBench benchmark suite, a benchmark that compares the accuracy of several symbolic regression algorithms. We measure the bias and variance of algorithms and show how algorithms like Operon and GP-GOMEA return highly accurate models with similar behavior despite changes in training data. Our results highlight that larger model sizes do not imply different behavior when training data change. On the contrary, larger models effectively prevent systematic errors. We also show how other algorithms like ITEA or AIFeynman with the declared goal of producing consistent results meet up to their expectation of small and similar models.
Databáze: Directory of Open Access Journals