Autonomous acquisition of arbitrarily complex skills using locality based graph theoretic features: a syntactic approach to hierarchical reinforcement learning

Autor:	Borahan Tumer, Zeynep Kumralbaş, Kutalmış Coşkun, Semiha Hazel Çavuş
Přispěvatelé:	Kumralbaş Z., Çavuş S. H., Coşkun K., Tümer B.
Rok vydání:	2023
Předmět:	Dynamic community detection Control and Optimization Temporal abstraction Temel Bilimler (SCI) Mühendislik ENGINEERING AUTOMATION & CONTROL SYSTEMS Modelleme ve Simülasyon Skill construction Kontrol ve Optimizasyon MATHEMATICS Information Systems Communication and Control Engineering Hierarchical reinforcement learning Reinforcement learning Skill coupling Bilgisayar Bilimleri Engineering Computing & Technology (ENG) Matematik Bilgisayar Bilimi Uygulamaları Community detection Computer Sciences Temel Bilimler Mühendislik Bilişim ve Teknoloji (ENG) COMPUTER SCIENCE MATEMATİK UYGULAMALI Computer Science Applications MATHEMATICS APPLIED OTOMASYON & KONTROL SİSTEMLERİ Fizik Bilimleri Control and Systems Engineering Modeling and Simulation Natural Sciences (SCI) Physical Sciences Engineering and Technology Bilgisayar Bilimi Mühendislik ve Teknoloji Bilgi Sistemleri Haberleşme ve Kontrol Mühendisliği Control and System Engineering Natural Sciences Kontrol ve Sistem Mühendisliği
Zdroj:	Evolving Systems.
ISSN:	1868-6486 1868-6478
DOI:	10.1007/s12530-022-09478-6
Popis:	© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.With the growing state/action space, learning a satisfactory policy for regular Reinforcement Learning (RL) algorithms such as flat Q-learning becomes quickly infeasible. One possible solution to handle such cases is to employ hierarchical RL (HRL). In this work, we present two methods to autonomously construct (1) skills (ASKA) and (2) arbitrarily elaborate superskills or complexes through defining an arbitrary number of hierarchies in HRL (ASKAC) over a graph-based iteratively-growing environment model. We employ dynamic community detection (DCD) in detecting subgoals since DCD considers local changes only over the partially growing graphs and lowers the time complexity of the subgoal detection where groups of environment states (i.e., subenvironments) are modeled by communities from the graph theory. DCD’s drawback is oversegmentation where it mispartitions a subenvironment further into smaller components. To maintain the robustness of ASKAC against DCD’s possible oversegmentation we introduce the concept of skill coupling. Skill coupling does not only robustly solve the oversegmentation issue, but it also improves HRL by building up more elaborate complexes (i.e., skill compositions) obtained at an arbitrary number of hierarchies and reduces the number of decisions leading to the goal employing these complexes. In addition to the experiments that investigate the effect of parameters, proposed methods are experimentally evaluated in grid world and taxi driver benchmark environments.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4bf69cbd73d22dc9d5e1c65a89f6fe43 https://doi.org/10.1007/s12530-022-09478-6 Zobrazit plný text záznamu Full text from SpringerLink