Reinforcement learning for trench excavation

Autor: Rankin, Jake
Rok vydání: 2023
Předmět:
DOI: 10.26174/thesis.lboro.22714543.v1
Popis: Excavation autonomy is an area of industrial interest as there is a significant skills shortage for excavator operators, yet the industry is facing higher budget constraints than ever before. In particular, the excavation task of trenching is heavily impacted as it is difficult to perform but is also integral for most construction projects. Traditionally, autonomy in excavation is often rule-based or control theory-based, which is difficult to scale and apply to different scenarios such as different ground conditions or trench design, common in construction. Reinforcement learning has been applied with successes in similar fields, like robotics, demonstrating its capability in handling non-linearity and complex environments. This thesis addresses the deployment reinforcement learning to a trench excavation task, to provide a potential operator-like behaviour. Twin-Delayed Deep Deterministic Policy Gradients (TD3) were identified as the most appropriate for the research, due to their superior performance on robotics tasks. To deploy TD3 to trenching, a three-part strategy was used that focused on developing the environment, state, and reward function that TD3 would be involved with. First, the overall system architecture and environment for a reinforcement learning-based autonomy system was designed, which utilised the Common Data Environment to store and deploy a trained algorithm and the excavation plans. This was designed alongside driver data analysis to provide additional context to the challenges of trenching. Next, the selection of optimal machine sensors was studied, including the comparison of feature selection methods against existing sensor arrays. These were applied to a neural network trained on driver data, and the assumption was this would be like the policy of a trained TD3 algorithm. This was done to determine what impact the state has on the predictive performance of the policy whilst avoiding the inclusion of a reward function, allowing several machine tasks to use the same approach. Finally, TD3 was deployed to an excavator, focused on developing the reward function for trenching. This was done by breaking the trenching task into smaller tasks for developing the distance and mass rewards, before unifying them into one reward function. By focusing on individual elements of a reward function, the impact can be understood more effectively. TD3 was then deployed to perform a trench excavation task, where it was able to dig within the trench region and dump soil into a hopper. The novel findings of this research were 1) Exploratory analysis of driver data during trenching 2) An autonomy roadmap for excavation 3) An architecture of an RL-based autonomy system for excavation that utilises the driver, simulation, and Common Data Environment 4) The use of feature selection methods to determine machine sensor inputs for the policy network of TD3, without using the reward function 5) A new methodology for developing a reward function 6) The deployment of TD3 to perform a trench excavation task.
Databáze: OpenAIRE