Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Dhankar, Harshit"'
In the realm of multi-arm bandit problems, the Gittins index policy is known to be optimal in maximizing the expected total discounted reward obtained from pulling the Markovian arms. In most realistic scenarios however, the Markovian state transitio
Externí odkaz:
http://arxiv.org/abs/2405.01157