Policy Iteration Reinforcement Learning-based control using a Grey Wolf Optimizer algorithm

Autor:	Raul-Cristian Roman, Radu-Emil Precup, Iuliu Alexandru Zamfirache, Emil M. Petriu
Rok vydání:	2022
Předmět:	0209 industrial biotechnology Information Systems and Management Optimization problem Artificial neural network Computer science Particle swarm optimization 02 engineering and technology Servomechanism Computer Science Applications Theoretical Computer Science law.invention 020901 industrial engineering & automation Artificial Intelligence Control and Systems Engineering law Convergence (routing) 0202 electrical engineering electronic engineering information engineering Reinforcement learning 020201 artificial intelligence & image processing Gradient descent Algorithm Metaheuristic Software
Zdroj:	Information Sciences. 585:162-175
ISSN:	0020-0255
DOI:	10.1016/j.ins.2021.11.051
Popis:	This paper presents a new Reinforcement Learning (RL)-based control approach that uses the Policy Iteration (PI) and a metaheuristic Grey Wolf Optimizer (GWO) algorithm to train the Neural Networks (NNs). Due to an efficient tradeoff to exploration and exploitation, the GWO algorithm shows good results in NN training and solving complex optimization problems. The proposed approach is compared to the classical PI RL-based control approach using the Gradient Descent (GD) algorithm, and with the RL-based control approach which uses the metaheuristic Particle Swarm Optimization (PSO) algorithm. The experiments are conducted using a nonlinear servo system laboratory equipment. Each approach evaluated on how well it solves the optimal reference tracking control for an experimental servo system position control system. The policy NNs specific to all three approaches are implemented as state feedback with integrator controllers to remove the steady-state control errors and thus ensure the convergence of the objective function. Because of the random nature of metaheuristic algorithms, the experiments for GWO and PSO algorithms are run multiple times and the results are averaged before the conclusions are presented. The experimental results shows that for the control objective considered in this paper, the GWO algorithm represents a better solution compared to GD and PSO algorithms.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c7553d4316116a9f773288d803ba9028 https://doi.org/10.1016/j.ins.2021.11.051 Zobrazit plný text záznamu Full Text from ScienceDirect