Comparative Analysis of Reinforcement Learning Algorithms for Bipedal Robot Locomotion

Autor:	Omur Aydogmus, Musa Yilmaz
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Hyperparameter optimization reinforcement learning robot motion Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 7490-7499 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3344393
Popis:	In this research, an optimization methodology was introduced for improving bipedal robot locomotion controlled by reinforcement learning (RL) algorithms. Specifically, the study focused on optimizing the Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), Soft Actor-Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradients (TD3) algorithms. The optimization process utilized the Tree-structured Parzen Estimator (TPE), a Bayesian optimization technique. All RL algorithms were applied to the same environment, which was created within the OpenAI GYM framework and known as the bipedal walker. The optimization involved the fine-tuning of key hyperparameters, including learning rate, discount factor, generalized advantage estimation, entropy coefficient, and Polyak update parameters. The study comprehensively analyzed the impact of these hyperparameters on the performance of RL algorithms. The results of the optimization efforts were promising, as the fine-tuned RL algorithms demonstrated significant improvements in performance. The mean reward values for the 10 trials were as follows: PPO achieved an average reward of 181.3, A2C obtained an average reward of −122.2, SAC reached an average reward of 320.3, and TD3 had an average reward of 278.6. These outcomes underscore the effectiveness of the optimization approach in enhancing the locomotion capabilities of the bipedal robot using RL techniques.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/40d2995c40e347a4aeb4307fed436f91 Zobrazit plný text záznamu View record in DOAJ