Context Detection and Identification In Multi-Agent Reinforcement Learning With Non-Stationary Environment Çok Etmenli Pekiştirmeli Öǧrenmede Devingen Ortamlarda Baǧlam Deǧişim Tespiti ve Tanimlama


Talha Selamet E., Tumer B.

30th Signal Processing and Communications Applications Conference, SIU 2022, Safranbolu, Türkiye, 15 - 18 Mayıs 2022 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu55565.2022.9864802
  • Basıldığı Şehir: Safranbolu
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: context detection, multi-agent, non-stationary environment, Reinforcement learning
  • Marmara Üniversitesi Adresli: Evet

Özet

© 2022 IEEE.Reinforcement learning methods are mostly constructed on the very assumption that environments are stationary. However, most real world environments are non-stationary; that is, we assume they are composed of several stationary components (i.e., sub-environments or contexts). So, methods with this assumption are not capable of learning non-stationary environments. Reinforcement Learning - Context Detection (RL-CD) method enables the agent to learn the environment without prior information; detect the environment's context change points and create a partial model for each context. The underlying environment of this approach is single-agent and has shortcomings for multi-agent learning. In this study, we introduce a new approach called Multi-agent reinforcement learning-context detection (MARL-CD), which can both detect context change points and enable agents to learn non-stationary environments with multi-agent settings. This approach is based on RL-CD approach. MARL-CD is more efficient in terms of detecting context change created by the agents on the environment and detecting the context change of the environment itself. It enables an agent to detect the context changes not only from the change of environment dynamics but also from policy changes of agents in the environment. In the approach in this study, it has been shown by the experimental results that the agents spend 16% less energy and are more efficient than RL-CD in terms of detecting the change points more accurately.