Autonomous acquisition of arbitrarily complex skills using locality based graph theoretic features: a syntactic approach to hierarchical reinforcement learning


Kumralbaş Z., Çavuş S. H., Coşkun K., Tümer B.

Evolving Systems, cilt.14, sa.6, ss.957-980, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14 Sayı: 6
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1007/s12530-022-09478-6
  • Dergi Adı: Evolving Systems
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Sayfa Sayıları: ss.957-980
  • Anahtar Kelimeler: Reinforcement learning, Hierarchical reinforcement learning, Skill construction, Skill coupling, Temporal abstraction, Community detection, Dynamic community detection
  • Marmara Üniversitesi Adresli: Evet

Özet

© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.With the growing state/action space, learning a satisfactory policy for regular Reinforcement Learning (RL) algorithms such as flat Q-learning becomes quickly infeasible. One possible solution to handle such cases is to employ hierarchical RL (HRL). In this work, we present two methods to autonomously construct (1) skills (ASKA) and (2) arbitrarily elaborate superskills or complexes through defining an arbitrary number of hierarchies in HRL (ASKAC) over a graph-based iteratively-growing environment model. We employ dynamic community detection (DCD) in detecting subgoals since DCD considers local changes only over the partially growing graphs and lowers the time complexity of the subgoal detection where groups of environment states (i.e., subenvironments) are modeled by communities from the graph theory. DCD’s drawback is oversegmentation where it mispartitions a subenvironment further into smaller components. To maintain the robustness of ASKAC against DCD’s possible oversegmentation we introduce the concept of skill coupling. Skill coupling does not only robustly solve the oversegmentation issue, but it also improves HRL by building up more elaborate complexes (i.e., skill compositions) obtained at an arbitrary number of hierarchies and reduces the number of decisions leading to the goal employing these complexes. In addition to the experiments that investigate the effect of parameters, proposed methods are experimentally evaluated in grid world and taxi driver benchmark environments.