MEET: A Monte Carlo Exploration-Exploitation Trade-Off for Buffer Sampling
Autor: | Ott, Julius, Servadei, Lorenzo, Arjona-Medina, Jose, Rinaldi, Enrico, Mauro, Gianfranco, Lopera, Daniela Sánchez, Stephan, Michael, Stadelmayer, Thomas, Santra, Avik, Wille, Robert |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). |
DOI: | 10.1109/icassp49357.2023.10095236 |
Popis: | Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average. Accepted at ICASSP 2023 |
Databáze: | OpenAIRE |
Externí odkaz: |