Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Autor:	Martijn Onderwater, Sandjai Bhulai, Rob van der Mei
Přispěvatelé:	Stochastics, Directie
Jazyk:	angličtina
Rok vydání:	2015
Předmět:	Structure (mathematical logic) Mathematical optimization Computer Networks and Communications Computer science Markov processes Evolutionary algorithm Genetic programming Dynamic programming Field (computer science) Hardware and Architecture Bellman equation Feature (machine learning) Markov decision process Value (mathematics) Software
Zdroj:	ACM SIGMETRICS Performance Evaluation Review, 43(2), 7-9
ISSN:	0163-5999
Popis:	In this paper we describe recent progress in our work on Value Function Discovery (vfd), a novel method for discovery of value functions for Markov Decision Processes (mdps). In a previous paper we described how vfd discovers algebraic descriptions of value functions (and the corresponding policies) using ideas from the Evolutionary Algorithm field. A special feature of vfd is that the descriptions include the model parameters of the mdp. We extend that work and show how additional information about the structure of the mdp can be included in vfd. This alternative use of vfd still yields near-optimal policies, and is much faster. Besides increased performance and improved run times, this approach illustrates that vfd is not restricted to learning value functions and can be applied more generally.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b3ee2d73bf4d82f568f7240ad23f0b8c https://ir.cwi.nl/pub/23827 Zobrazit plný text záznamu