Zobrazeno 1 - 10
of 613
pro vyhledávání: '"on-off policy"'
Publikováno v:
ICT Express, Vol 10, Iss 6, Pp 1308-1314 (2024)
Deep reinforcement learning (RL) has emerged as a promising solution for autonomous devices requiring sequential decision-making. In the online RL framework, the agent must interact with the environment to collect data, making sample efficiency the m
Externí odkaz:
https://doaj.org/article/c8f6adc0406b4d2b895da544a076c00c
Autor:
Jineng Ren
Publikováno v:
International Journal of Computational Intelligence Systems, Vol 17, Iss 1, Pp 1-18 (2024)
Abstract This paper proposes a gradient-based multi-agent actor-critic algorithm for off-policy reinforcement learning using importance sampling. Our algorithm is incremental with full gradients, and its complexity per iteration scales linearly with
Externí odkaz:
https://doaj.org/article/3fd1277c9b234751b5bfbbae5d5a0742
Publikováno v:
Безопасность информационных технологий, Vol 31, Iss 2, Pp 90-110 (2024)
Striking a balance between safety and performance remains a critical concern, despite advancements in the field. To address this issue, a versatile framework named Safety Goes Along with Performance (SGAWP) is proposed, centered on off-policy algorit
Externí odkaz:
https://doaj.org/article/723b9a9315634d5e981dd3abef70ba8f
Publikováno v:
In Journal of the Franklin Institute January 2025 362(1)
Publikováno v:
Frontiers in Energy Research, Vol 12 (2024)
With the promotion and development of clean energy, it is challenging to ensure the optimization of control performance in frequency control of the hydropower-photovoltaic hybrid microgrid system caused by the output power fluctuation of photovoltaic
Externí odkaz:
https://doaj.org/article/9d11ab20ba494df387db4d42adb0c4ac
Publikováno v:
In Neurocomputing 7 March 2025 621
Publikováno v:
Sensors, Vol 24, Iss 23, p 7746 (2024)
Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision
Externí odkaz:
https://doaj.org/article/84b8c96230b94a4ca4e4cbbc1b4050c7
Publikováno v:
Applied Sciences, Vol 14, Iss 23, p 11114 (2024)
Temporal difference (TD) learning is a powerful framework for value function approximation in reinforcement learning. However, standard TD methods often struggle with feature representation and off-policy learning challenges. In this paper, we propos
Externí odkaz:
https://doaj.org/article/318b65f0260b42caabbbbb87af59ce5e
Publikováno v:
Mathematics, Vol 12, Iss 22, p 3603 (2024)
In reinforcement learning, off-policy temporal difference learning methods have gained significant attention due to their flexibility in utilizing existing data. However, traditional off-policy temporal difference methods often suffer from poor conve
Externí odkaz:
https://doaj.org/article/7bc1dba110154c2db1ac64f9ee67dbd9
Publikováno v:
IEEE Open Journal of the Communications Society, Vol 5, Pp 2302-2318 (2024)
Wireless body area networks (WBANs) can provide continuous monitoring of human biological signals. Due to the limited energy of the sensors, wireless-powered system has been adopted to prolong network lifetime for implant WBANs. In this paper, we pro
Externí odkaz:
https://doaj.org/article/5ede30ddec5644fbaa2837474640342c