Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Montesinos, Victoriano"'
Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternativ
Externí odkaz:
http://arxiv.org/abs/2310.12921
Publikováno v:
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference [Annu Int Conf IEEE Eng Med Biol Soc] 2019 Jul; Vol. 2019, pp. 2196-2201.