Simulation-based deep reinforcement learning for multi-objective identical parallel machine scheduling problem

Autor:	Sohyun Nam, Young-in Cho, Jong Hun Woo
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Parallel machine scheduling problem Deep reinforcement learning Discrete-event simulation Proximal policy optimization Ocean engineering TC1501-1800 Naval architecture. Shipbuilding. Marine engineering VM1-989
Zdroj:	International Journal of Naval Architecture and Ocean Engineering, Vol 16, Iss , Pp 100629- (2024)
Druh dokumentu:	article
ISSN:	2092-6782
DOI:	10.1016/j.ijnaoe.2024.100629
Popis:	In the shipbuilding industry, traditional optimization studies based on linear programming and constraint programming have been conducted to solve mid-term or long-term scheduling problems. However, due to the extensive computational time, these methods face limitations in addressing short-term scheduling problems for the unit production systems of shipbuilding processes, where various environmental uncertainties must be considered. This study employs a deep reinforcement learning approach to develop a dynamic scheduling algorithm for the welding process in profile shops, considering the random arrival of materials and variability in processing time. The scheduling problems of the welding process are formulated as multi-objective identical parallel machine scheduling problems, aimed at minimizing both setup time and tardiness. This study proposes a novel Markov decision process model for the multi-objective scheduling problems for the welding process, incorporating setup requirements and due date-related constraints into the state representation, action modelling, and reward design. Additionally, based on the proposed Markov decision process model, this study develops a learning environment in which a discrete-event simulation model of the welding process is integrated for state transition considering the uncertainties in the welding process. In the training phase of the scheduling agent, the Proximal Policy Optimization algorithm is applied to learn the scheduling policy, which is approximated by deep neural networks. The performance of the proposed algorithm is validated in comparison to four priority rules (SSPT, ATCS, MDD, and COVERT) for various test scenarios with different workloads and levels of variability in processing time.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/cab882a94e474438a185103940601445 Zobrazit plný text záznamu View record in DOAJ