Decentralized learning in finite Markov chains
Autor: | Kumpati S. Narendra, Richard M. Wheeler |
---|---|
Rok vydání: | 1986 |
Předmět: |
Equilibrium point
Stochastic control Computer Science::Computer Science and Game Theory Mathematical optimization Markov kernel Markov chain Computer science Variable-order Markov model Stochastic game Q-learning Markov process Partially observable Markov decision process Markov model Computer Science Applications Continuous-time Markov chain symbols.namesake Control and Systems Engineering symbols Ergodic theory Markov property Markov decision process Electrical and Electronic Engineering Mathematics |
Zdroj: | IEEE Transactions on Automatic Control. 31:519-526 |
ISSN: | 0018-9286 |
DOI: | 10.1109/tac.1986.1104342 |
Popis: | The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains. A new result on convergence in identical payoff games with a unique equilibrium point is also presented. |
Databáze: | OpenAIRE |
Externí odkaz: |