Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Nil Stolt Anso"'
Publikováno v:
ArXiv. Cornell University Press
University of Groningen
University of Groningen
In this paper, a new offline actor-critic learning algorithm is introduced: Sampled Policy Gradient (SPG). SPG samples in the action space to calculate an approximated policy gradient by using the critic to evaluate the samples. This sampling allows
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::48e762ba8cf82344d42e62932113380d
https://research.rug.nl/en/publications/769cda31-8ddf-443c-b13e-89006b8d3f91
https://research.rug.nl/en/publications/769cda31-8ddf-443c-b13e-89006b8d3f91