Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Autor:	Davar, Parisa, Godin, Frédéric, Garrido, Jose
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Quantitative Finance - Risk Management
Druh dokumentu:	Working Paper
Popis:	This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented. Comment: The Python code to replicate the various numerical experiments of this paper is available at https://github.com/parisadavar/EVT-policy-gradient-RL
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.15612 Zobrazit plný text záznamu View this record from Arxiv