A way around the exploration-exploitation dilemma

Autor:	Peterson, Erik J, Verstynen, Timothy D
Jazyk:	angličtina
Rok vydání:	2019
DOI:	10.1101/671362
Popis:	For all animals the decision to explore comes with a risk of getting less. For example, a foraging bee might find less nectar, or hunting hawk less prey. This loss is often formalized as regret. It’s been mathematically proven that exploring an uncertain world with a specific goal always has some regret. This is why exploration-exploitation can be a dilemma. Given this proof we wondered if the common advice to “focus on learning and not the goal” might have mathematical merit. So we re-imagined exploration in the dilemma as an open ended search for any new information. We then developed a new minimal description of information value, which generalizes existing ideas like curiosity, novelty and information gain. We use this description to model the dilemma as a competition between strategies that maximize reward and information independently. Here we prove this competition has a no regret solution. When we study this solution in simulation – using classic bandit tasks – it outperforms standard approaches, especially when rewards are sparse.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=sharebioRxiv::c2e78312f40bc63b53278aa63c2939ec Zobrazit plný text záznamu