Zobrazeno 1 - 10
of 2 805
pro vyhledávání: '"contextual bandit"'
Autor:
Kim, Seok-Jin, Oh, Min-hwan
We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem. We introduce a novel condition, named the \textit{Local Anti-Concentration} (LAC) condition, which enables a greedy bandit algorithm t
Externí odkaz:
http://arxiv.org/abs/2411.12878
Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user fe
Externí odkaz:
http://arxiv.org/abs/2408.14432
Autor:
Guo, Hongbo, Zhu, Zheqing
Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance conte
Externí odkaz:
http://arxiv.org/abs/2406.02515
Publikováno v:
International Journal of Industrial Engineering Computations, Vol 15, Iss 4, Pp 951-964 (2024)
The stochastic contextual bandit problem, recognized for its effectiveness in navigating the classic exploration-exploitation dilemma through ongoing player-environment interactions, has found broad applications across various industries. This utilit
Externí odkaz:
https://doaj.org/article/154a3e0a24d7485c9710d585e63af0dd
Contextual bandit with linear reward functions is among one of the most extensively studied models in bandit and online learning research. Recently, there has been increasing interest in designing \emph{locally private} linear contextual bandit algor
Externí odkaz:
http://arxiv.org/abs/2404.09413
Traditional imitation learning focuses on modeling the behavioral mechanisms of experts, which requires a large amount of interaction history generated by some fixed expert. However, in many streaming applications, such as streaming recommender syste
Externí odkaz:
http://arxiv.org/abs/2403.16075
Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method
Warfarin, an anticoagulant medication, is formulated to prevent and address conditions associated with abnormal blood clotting, making it one of the most prescribed drugs globally. However, determining the suitable dosage remains challenging due to i
Externí odkaz:
http://arxiv.org/abs/2402.11123
Publikováno v:
International Journal of Production Research. Jul2022, Vol. 60 Issue 13, p4090-4116. 27p. 9 Diagrams, 7 Charts, 4 Graphs.
Publikováno v:
IEEE Transactions on Vehicular Technology, vol. 72, n.o 7, pp. 9099-9114, Jul 2023
The combination of multiple-input multiple-output (MIMO) systems and intelligent reflecting surfaces (IRSs) is foreseen as a critical enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization
Externí odkaz:
http://arxiv.org/abs/2401.16901
Transformer requires a fixed number of layers and heads which makes them inflexible to the complexity of individual samples and expensive in training and inference. To address this, we propose a sample-based Dynamic Hierarchical Transformer (DHT) mod
Externí odkaz:
http://arxiv.org/abs/2312.03038