Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Tan, Charlie B."'
Proximal policy optimization (PPO) is a widely-used algorithm for on-policy reinforcement learning. This work offers an alternative perspective of PPO, in which it is decomposed into the inner-loop estimation of update vectors, and the outer-loop app
Externí odkaz:
http://arxiv.org/abs/2411.00666
Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. There is a recent and growing body of literature that proposes the framework of fractals to model opti
Externí odkaz:
http://arxiv.org/abs/2406.02234