The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator
Autor: | Zhaomei Sun, Huajie Hong, Dasheng Xu, Yangyang Hou, Zhe Zeng |
---|---|
Rok vydání: | 2021 |
Předmět: |
0209 industrial biotechnology
reward function Computer Networks and Communications Computer science lcsh:TK7800-8360 02 engineering and technology Kinematics Motion (physics) Field (computer science) 020901 industrial engineering & automation Position (vector) Control theory Reinforcement learning Electrical and Electronic Engineering deep reinforcement learning lcsh:Electronics Process (computing) Function (mathematics) manipulator 021001 nanoscience & nanotechnology Hardware and Architecture Control and Systems Engineering Signal Processing rebirth mechanism 0210 nano-technology Efficient energy use |
Zdroj: | Electronics Volume 10 Issue 7 Electronics, Vol 10, Iss 870, p 870 (2021) |
ISSN: | 2079-9292 |
DOI: | 10.3390/electronics10070870 |
Popis: | As a research hotspot in the field of artificial intelligence, the application of deep reinforcement learning to the learning of the motion ability of a manipulator can help to improve the learning of the motion ability of a manipulator without a kinematic model. To suppress the overestimation bias of values in Deep Deterministic Policy Gradient (DDPG) networks, the Twin Delayed Deep Deterministic Policy Gradient (TD3) was proposed. This paper further suppresses the overestimation bias of values for multi-degree of freedom (DOF) manipulator learning based on deep reinforcement learning. Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree freedom manipulators is in place, with an improved learning ability by 29.15% on the basis of TD3. In this paper, a step-by-step reward function is proposed specifically for the learning and innovation of the multi degree of freedom manipulator’s motion ability. The view of continuous decision-making and process problem is used to guide the learning of the manipulator, and the learning efficiency is improved by optimizing the playback of experience. In order to measure the point-to-point position motion ability of a manipulator, a new evaluation index based on the characteristics of the continuous decision process problem, energy efficiency distance, is presented in this paper, which can evaluate the learning quality of the manipulator motion ability by a more comprehensive and fair evaluation algorithm. |
Databáze: | OpenAIRE |
Externí odkaz: |