Learning to balance an NAO robot using reinforcement learning with symbolic inverse kinematic

Autor:	Onder Tutsoy, Duygun Erol Barkana, Şule Çolak
Přispěvatelé:	O. Tutsoy, D. Erol Barkana, S. Colak, Yeditepe Üniversitesi
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	Nao robot 0209 industrial biotechnology Engineering reinforcement learning Modelica Inverse Complete symbolic inverse kinematic solution 02 engineering and technology Kinematics MapleSim 020901 industrial engineering & automation 0202 electrical engineering electronic engineering information engineering Reinforcement learning Instrumentation Balance (ability) Control algorithm business.industry NAO lower body balancing Control engineering autonomous humanoid robot convergent value function multi-body modelling software 020201 artificial intelligence & image processing Artificial intelligence business Humanoid robot
Popis:	An autonomous humanoid robot (HR) with learning and control algorithms is able to balance itself during sitting down, standing up, walking and running operations, as humans do. In this study, reinforcement learning (RL) with a complete symbolic inverse kinematic (IK) solution is developed to balance the full lower body of a three-dimensional (3D) NAO HR which has 12 degrees of freedom. The IK solution converts the lower body trajectories, which are learned by RL, into reference positions for the joints of the NAO robot. This reduces the dimensionality of the learning and control problems since the IK integrated with the RL eliminates the need to use whole HR states. The IK solution in 3D space takes into account not only the legs but also the full lower body; hence, it is possible to incorporate the effect of the foot and hip lengths on the IK solution. The accuracy and capability of following real joint states are evaluated in the simulation environment. MapleSim is used to model the full lower body, and the developed RL is combined with this model by utilizing Modelica and Maple software properties. The results of the simulation show that the value function is maximized, temporal difference error is reduced to zero, the lower body is stabilized at the upright, and the convergence speed of the RL is improved with use of the symbolic IK solution. © 2017, © The Author(s) 2017.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::57a5911da94bb0e2d5771d3f573147cb https://hdl.handle.net/20.500.11831/3812 Zobrazit plný text záznamu