Reınforcement learnıng and evolutıonary algorıthms for contaıner loadıng problem

Autor:	Tijjani, Sani
Přispěvatelé:	Özkaya, Armağan, Bilgisayar Mühendisliği Ana Bilim Dalı
Jazyk:	angličtina
Rok vydání:	2014
Předmět:	Computer Engineering and Computer Science and Control Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol
Popis:	Container Loading Problem (CLP) is a space utilization problem subject to various constraints. An example of it is the placement of containers in storage so as to minimize the waste of space. Other constraints that may be imposed include a certain loading order and an even weight distribution. Although evolutionary algorithms have been extensively studied to solve this problem, Reinforcement Learning (RL) which is a means of learning optimal behaviors by interacting with the environment, has not received enough attention in this respect. This work explores the use of RL as an alternative for tackling CLP so as to minimize the waste of space while maximizing the number of containers. We have applied five different RL algorithms (Q-learning, TD( ), Monte-Carlo, TDQ-learning and SARSA), and two types of evolutionary algorithms (Genetic Algorithm and Ant Colony Optimization) to solve this problem. Simulations have been carried out using MATLAB to compare these algorithms based on space utilization, number of containers, simulation time and speed of convergence. The simulation parameters are set so that the algorithms are allowed to fill in storage yards 100% with containers of different sizes. Results show that, in general, RL may not guarantee the best results, but can minimize the computational difficulty providing a simple way to solve this problem. Genetic Algorithm (GA) on the other hand gives best speed of convergence for small storage yards, and, unlike other approaches that may require complex computations, four RL algorithms, namely Q-learning, TD( ), Monte-Carlo and SARSA, give better speed of convergence than GA for larger storage yards used in our simulations. Ant Colony Optimization (ACO), while generally being worse than the others in terms of average convergence time, performs better than the ordinary Q-learning for the largest storage yard area used in our simulations and gives a result similar to GA. Growth of ACO's average convergence time seems to be slower than those of others, indicating that it has the potential to have better convergence times with increasing storage yard area sizes. In terms of the number of containers that can be packed into storage yard when the number of containers is not restricted, all RL algorithms give approximately the same result. As the storage yard size becomes larger, however, Q-learning starts performing better than all the other RL algorithms in this respect and Monte-Carlo gradually worsens even though it is the best of all for filling small storage yards. But in terms of simulation time (for RL algorithms only) TDQ performs the best for all storage yard area combinations, except for the smallest area in which it comes in second position after TD( ). This is because learning optimal value in TDQ can take place under any policy and learning rate effect. TD( ) comes second for all remaining combinations while MC and SARSA come third and fourth respectively. Container Loading Problem (CLP) is a space utilization problem subject to various constraints. An example of it is the placement of containers in storage so as to minimize the waste of space. Other constraints that may be imposed include a certain loading order and an even weight distribution. Although evolutionary algorithms have been extensively studied to solve this problem, Reinforcement Learning (RL) which is a means of learning optimal behaviors by interacting with the environment, has not received enough attention in this respect. This work explores the use of RL as an alternative for tackling CLP so as to minimize the waste of space while maximizing the number of containers. We have applied five different RL algorithms (Q-learning, TD( ), Monte-Carlo, TDQ-learning and SARSA), and two types of evolutionary algorithms (Genetic Algorithm and Ant Colony Optimization) to solve this problem. Simulations have been carried out using MATLAB to compare these algorithms based on space utilization, number of containers, simulation time and speed of convergence. The simulation parameters are set so that the algorithms are allowed to fill in storage yards 100% with containers of different sizes. Results show that, in general, RL may not guarantee the best results, but can minimize the computational difficulty providing a simple way to solve this problem. Genetic Algorithm (GA) on the other hand gives best speed of convergence for small storage yards, and, unlike other approaches that may require complex computations, four RL algorithms, namely Q-learning, TD( ), Monte-Carlo and SARSA, give better speed of convergence than GA for larger storage yards used in our simulations. Ant Colony Optimization (ACO), while generally being worse than the others in terms of average convergence time, performs better than the ordinary Q-learning for the largest storage yard area used in our simulations and gives a result similar to GA. Growth of ACO's average convergence time seems to be slower than those of others, indicating that it has the potential to have better convergence times with increasing storage yard area sizes. In terms of the number of containers that can be packed into storage yard when the number of containers is not restricted, all RL algorithms give approximately the same result. As the storage yard size becomes larger, however, Q-learning starts performing better than all the other RL algorithms in this respect and Monte-Carlo gradually worsens even though it is the best of all for filling small storage yards. But in terms of simulation time (for RL algorithms only) TDQ performs the best for all storage yard area combinations, except for the smallest area in which it comes in second position after TD( ). This is because learning optimal value in TDQ can take place under any policy and learning rate effect. TD( ) comes second for all remaining combinations while MC and SARSA come third and fourth respectively. 95
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od_____10208::181bcd53e06cfde21a84af80927c9231 https://acikbilim.yok.gov.tr/handle/20.500.12812/666708 Zobrazit plný text záznamu