Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Provodin, Danil"'
Autor:
Provodin, Danil, Akker, Bram van den, Katsimerou, Christina, Kaptein, Maurits, Pechenizkiy, Mykola
In supervised machine learning, privileged information (PI) is information that is unavailable at inference, but is accessible during training time. Research on learning using privileged information (LUPI) aims to transfer the knowledge captured in P
Externí odkaz:
http://arxiv.org/abs/2408.14319
We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically c
Externí odkaz:
http://arxiv.org/abs/2405.19017
We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically c
Externí odkaz:
http://arxiv.org/abs/2309.15737
This paper presents a bidding system for sponsored search auctions under an unknown valuation model. This formulation assumes that the bidder's value is unknown, evolving arbitrarily, and observed only upon winning an auction. Unlike previous studies
Externí odkaz:
http://arxiv.org/abs/2304.00999
We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationa
Externí odkaz:
http://arxiv.org/abs/2209.03596
We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning.
Externí odkaz:
http://arxiv.org/abs/2202.06657
We consider a special case of bandit problems, namely batched bandits. Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain time period.
Externí odkaz:
http://arxiv.org/abs/2111.02071
Although the bandits framework is a classical and well-suited approach for optimal bidding strategies in sponsored search auctions, industrial attempts are rarely documented. This paper outlines the development process at Zalando, a leading fashion e
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c755f0276c17e6413d8822c0a632e382
http://arxiv.org/abs/2304.00999
http://arxiv.org/abs/2304.00999