Abstrakt: |
Robots are on the cusp of reaching a level of dynamic performance and intelligence where the rules of the game are about to be rewritten and they can make a meaningful contribution to society. The use of such robots will allow for the resolving the military issues, the completion of rescue operations, and the investigation of unstructured environments. Because of its highly dynamical design and control, legged robots are able to demonstrate new kinds of locomotion skills in natural settings. Similarly jumping is highly dynamic and tough maneuvers that exhibit complex dynamics and modeling, hence we intend to explore methods that involves model free learning and keeps less complex representation of the system. The approach is to give task to robot and let the robot itself figure out how to execute the task using trial, fail/pass and reward/penalty methods. In this paper authors have proposed an model free learning approach using reinforcement learning for hopping behavior of the robotic leg. We used robot's states like joint angles (θ1,θ2), angular velocities (θ ˙ 1 , θ ˙ 2) , height h and foot position (x, y) for exploring observation space by robot agent itself. Since joint level torques are causing to exert forces on the ground in order to get enough reaction forces to make floating motion of the base in vertically constrained. So we have used joint torques τ1 and τ2 as elements of action space. A model-free approach, and the complexity is minimized by utilising the PPO algorithm (proximal policy optimizer). This algorithm helps to educate agent based on policy by PPO. An neural network model as actor determines steps to take decision in order to increase its chances for reaching the desired goal. The goal is described as to jump the robot in vertical direction as shown in figure 1. The introduction section consists of state of the art discussions. The variety of learning approaches has been discussed in recent past few years. The second section describes the learning approach and methods. It advocates model free training of the system and various requirements for the environment. Learning architecture has been discussed in subsection. This section contains detailed discussion on our custom made gym, mujoco environment of the leg. As shown in figure 1 section 2.1 contains states, action, reward, learning agent etc. It also describes other functions like how reward function has been implemented. Result section discusses about simulation results. However we explained desired goal and its implantation. Authors found this process that speeds up the model's learning during the training phase. The process begin with observation of the prior state and action, and if the model is struggling or receives a poor reward for their behaviour, the agent will repeat their activity until they succeed in learning their lesson. In this paper there are graph that details every single aspect of the experiment in great detail. Finally,there are trained leg to jump higher than a particular height while also rewarding it for requiring little energy and producing significant results that has been discussed in conclusions section. [ABSTRACT FROM AUTHOR] |