Zobrazeno 1 - 10
of 80
pro vyhledávání: '"Cohen, Alon"'
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algo
Externí odkaz:
http://arxiv.org/abs/2406.12406
We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is corre
Externí odkaz:
http://arxiv.org/abs/2405.10027
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing
Externí odkaz:
http://arxiv.org/abs/2311.13877
We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes. Our work is the first to establish the optimal (w.r.t.~$K$) rate of co
Externí odkaz:
http://arxiv.org/abs/2308.14642
We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments where prior methods have struggled to succeed. This problem is formulated as mutual training of skills using an int
Externí odkaz:
http://arxiv.org/abs/2308.12649
We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algor
Externí odkaz:
http://arxiv.org/abs/2303.01464
We present the E-UC$^3$RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and l
Externí odkaz:
http://arxiv.org/abs/2211.14932
We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function. We present the first computationally-efficient algorithm that attains an optima
Externí odkaz:
http://arxiv.org/abs/2206.01426
We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function. We present a computationally efficient algorithm that attains an optimal $\sqrt{T}$ regret
Externí odkaz:
http://arxiv.org/abs/2203.01170
We consider stochastic optimization with delayed gradients where, at each time step $t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts asynchronous distribut
Externí odkaz:
http://arxiv.org/abs/2106.11879