Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Ran, Yuhang"'
We consider the problem of learning the best possible policy from a fixed dataset, known as offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is policy regularization, which typically constrains the learned policy by
Externí odkaz:
http://arxiv.org/abs/2306.06569
Autor:
Yang, Yuping, Xu, Lijia, Zhao, Yongpeng, Wang, Yuchao, Wu, Zhijun, Kang, Zhiliang, Zou, Zhiyong, Huang, Hui, He, Yong, Liu, Fei, Tang, Zuoliang, Feng, Ao, Ran, Yuhang, Feng, Shuo
Publikováno v:
In Chemical Engineering Journal 1 January 2025 503