A solving method of an mdp with a constraint by genetic algorithms

Autor:	Hajime Kawai, K Hirayama
Rok vydání:	2000
Předmět:	Computer Science::Machine Learning Mathematical optimization Linear programming Reward-based selection Process (computing) Computer Science Applications Constraint (information theory) Range (mathematics) Discrete time and continuous time Modelling and Simulation Modeling and Simulation Genetic algorithm Markov decision process Algorithm Mathematics
Zdroj:	Mathematical and Computer Modelling. 31:165-173
ISSN:	0895-7177
DOI:	10.1016/s0895-7177(00)00084-4
Popis:	We consider a discrete time Markov decision process (MDP) with a finite state space, a finite action space, and two kinds of immediate rewards. The problem is to maximize the time average reward generated by one reward stream, subject to the other reward not being smaller than a prescribed value. An MDP with a reward constraint can be solved by linear programming in the range of mixed policies. On the other hand, when we restrict ourselves to pure policies, the problem is a combinatorial problem, for which a solution has not been discovered. In this paper, we propose an approach by Genetic Algorithms (GAs) in order to obtain an effective search process and to obtain a near optimal, possibly optimal pure stationary policy. A numerical example is given to examine the efficiency of the approach proposed.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::49ab6ccfd58d47427092bfcaa655647f https://doi.org/10.1016/s0895-7177(00)00084-4 Zobrazit plný text záznamu Full Text from ScienceDirect