Výsledky vyhledávání - "Dhankar, Harshit"

Report

Tabular and Deep Reinforcement Learning for Gittins Index

Autor: Dhankar, Harshit, Mishra, Kshitij, Bodas, Tejas

In the realm of multi-arm bandit problems, the Gittins index policy is known to be optimal in maximizing the expected total discounted reward obtained from pulling the Markovian arms. In most realistic scenarios however, the Markovian state transitio

Externí odkaz: http://arxiv.org/abs/2405.01157

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání