Autor: |
Bastani, O., Ma, Y. J., Shen, E., Xu, W. |
Rok vydání: |
2022 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction. |
Databáze: |
arXiv |
Externí odkaz: |
|