SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Autor:	Ritesh Agarwal, Craig Boutilier, Sanmit Narvekar, Eugene Ie, Jing Wang, Rui Wu, Vihan Jain, Tushar Deepak Chandra, Heng-Tze Cheng
Rok vydání:	2019
Předmět:	Computer science business.industry Decomposition (computer science) Reinforcement learning Artificial intelligence business
Zdroj:	IJCAI
DOI:	10.24963/ijcai.2019/360
Popis:	Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9848abe5d124b8332994d556a9dba8ec https://doi.org/10.24963/ijcai.2019/360 Zobrazit plný text záznamu