Reinforcement Mechanism Design for e-commerce

Autor:	Yiwei Zhang, Pingzhong Tang, Aris Filos-Ratsikas, Qingpeng Cai
Rok vydání:	2018
Předmět:	FOS: Computer and information sciences reinforcement learning Mathematical optimization Computer Science - Artificial Intelligence Computer science Stability (learning theory) Rationality 010501 environmental sciences 01 natural sciences 0502 economics and business e-commerce Revenue Reinforcement learning Computer Science - Multiagent Systems 050205 econometrics 0105 earth and related environmental sciences Mechanism design 05 social sciences Total revenue mechanism design Artificial Intelligence (cs.AI) impression allocation Problem domain Markov decision process Heuristics Multiagent Systems (cs.MA)
Zdroj:	WWW Cai, Q, Filos-Ratsikas, A, Tang, P & Zhang, Y 2018, Reinforcement Mechanism Design for E-Commerce . in Proceedings of the 2018 World Wide Web Conference . WWW '18, pp. 1339–1348, The Web Conference 2018, Lyon, France, 23/04/18 . https://doi.org/10.1145/3178876.3186039
Popis:	We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform. We employ a general framework of reinforcement mechanism design, which uses deep reinforcement learning to design efficient algorithms, taking the strategic behaviour of the sellers into account. Specifically, we model the impression allocation problem as a Markov decision process, where the states encode the history of impressions, prices, transactions and generated revenue and the actions are the possible impression allocations in each round. To tackle the problem of continuity and high-dimensionality of states and actions, we adopt the ideas of the DDPG algorithm to design an actor-critic policy gradient algorithm which takes advantage of the problem domain in order to achieve convergence and stability. We evaluate our proposed algorithm, coined IA(GRU), by comparing it against DDPG, as well as several natural heuristics, under different rationality models for the sellers - we assume that sellers follow well-known no-regret type strategies which may vary in their degree of sophistication. We find that IA(GRU) outperforms all algorithms in terms of the total revenue.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a443b0fc65d383f2f7ad3e3abea7f527 https://doi.org/10.1145/3178876.3186039 Zobrazit plný text záznamu