Zobrazeno 1 - 10
of 28
pro vyhledávání: '"Malik, Dhruv"'
The traditional viewpoint on Sparse Mixture of Experts (MoE) models is that instead of training a single large expert, which is computationally expensive, we can train many small experts. The hope is that if the total parameter count of the small exp
Externí odkaz:
http://arxiv.org/abs/2409.00879
Tor, one of the most popular censorship circumvention systems, faces regular blocking attempts by censors. Thus, to facilitate access, it relies on "pluggable transports" (PTs) that disguise Tor's traffic and make it hard for the adversary to block T
Externí odkaz:
http://arxiv.org/abs/2309.14856
We consider robust empirical risk minimization (ERM), where model parameters are chosen to minimize the worst-case empirical loss when each data point varies over a given convex uncertainty set. In some simple cases, such problems can be expressed in
Externí odkaz:
http://arxiv.org/abs/2306.05649
In recommender system or crowdsourcing applications of online learning, a human's preferences or abilities are often a function of the algorithm's recent actions. Motivated by this, a significant line of work has formalized settings where an action's
Externí odkaz:
http://arxiv.org/abs/2305.02955
Adaptive optimization methods are well known to achieve superior convergence relative to vanilla gradient methods. The traditional viewpoint in optimization, particularly in convex optimization, explains this improved performance by arguing that, unl
Externí odkaz:
http://arxiv.org/abs/2211.02254
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary. We study restrictions on the adversary that enable efficient minimization of the \emph{complete policy regret}, whi
Externí odkaz:
http://arxiv.org/abs/2204.11174
Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure
Externí odkaz:
http://arxiv.org/abs/2106.07814
Agents trained by reinforcement learning (RL) often fail to generalize beyond the environment they were trained in, even when presented with new scenarios that seem similar to the training environment. We study the query complexity required to train
Externí odkaz:
http://arxiv.org/abs/2101.00300
Autor:
Malik, Dhruv, Pananjady, Ashwin, Bhatia, Kush, Khamaru, Koulik, Bartlett, Peter L., Wainwright, Martin J.
We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and rew
Externí odkaz:
http://arxiv.org/abs/1812.08305
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.