Zobrazeno 1 - 10
of 2 105
pro vyhledávání: '"John, C. S."'
Learning a transition model via Maximum Likelihood Estimation (MLE) followed by planning inside the learned model is perhaps the most standard and simplest Model-based Reinforcement Learning (RL) framework. In this work, we show that such a simple Mo
Externí odkaz:
http://arxiv.org/abs/2408.08994
This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model, with or wit
Externí odkaz:
http://arxiv.org/abs/2408.08859
The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarant
Externí odkaz:
http://arxiv.org/abs/2407.20124
We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immed
Externí odkaz:
http://arxiv.org/abs/2407.15439
Autor:
Liu, Xutong, Wang, Siwei, Zuo, Jinhang, Zhong, Han, Wang, Xuchuang, Wang, Zhiyong, Li, Shuai, Hajiesmaili, Mohammad, Lui, John C. S., Chen, Wei
We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$-dimensional multivariant random variable and the feedback follows a g
Externí odkaz:
http://arxiv.org/abs/2406.01386
With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challen
Externí odkaz:
http://arxiv.org/abs/2405.16587
Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences
Externí odkaz:
http://arxiv.org/abs/2405.02881
We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget $B_K$, which is the summation of the change of th
Externí odkaz:
http://arxiv.org/abs/2403.10732
Autor:
Yang, Hantao, Liu, Xutong, Wang, Zhiyong, Xie, Hong, Lui, John C. S., Lian, Defu, Chen, Enhong
We study the problem of federated contextual combinatorial cascading bandits, where $|\mathcal{U}|$ agents collaborate under the coordination of a central server to provide tailored recommendations to the $|\mathcal{U}|$ corresponding users. Existing
Externí odkaz:
http://arxiv.org/abs/2402.16312
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds, without touching private data owned b
Externí odkaz:
http://arxiv.org/abs/2402.03770