Zobrazeno 1 - 10
of 229
pro vyhledávání: '"Hazan, Elad"'
Autor:
Liu, Y. Isabel, Nguyen, Windsor, Devre, Yagiz, Dogariu, Evan, Majumdar, Anirudha, Hazan, Elad
This paper describes an efficient, open source PyTorch implementation of the Spectral Transform Unit. We investigate sequence prediction tasks over several modalities including language, robotics, and simulated dynamical systems. We find that for the
Externí odkaz:
http://arxiv.org/abs/2409.10489
The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of p
Externí odkaz:
http://arxiv.org/abs/2406.01799
Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational
Externí odkaz:
http://arxiv.org/abs/2402.08929
Fine-tuning is the primary methodology for tailoring pre-trained large language models to specific tasks. As the model's scale and the diversity of tasks expand, parameter-efficient fine-tuning methods are of paramount importance. One of the most wid
Externí odkaz:
http://arxiv.org/abs/2401.04151
This paper studies sequence modeling for prediction tasks with long range dependencies. We propose a new formulation for state space models (SSMs) based on learning linear dynamical systems with the spectral filtering algorithm (Hazan et al. (2017)).
Externí odkaz:
http://arxiv.org/abs/2312.06837
We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms
Externí odkaz:
http://arxiv.org/abs/2312.04792
Autor:
Hazan, Elad, Megiddo, Nimrod
A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after $T$ time periods is $O(\sqrt{T \log T})$ - which is the minimum possible up to a logarithmic term. In addition, the new algorithm is
Externí odkaz:
http://arxiv.org/abs/2307.11668
Autor:
Snyder, David, Booker, Meghan, Simon, Nathaniel, Xia, Wenhan, Suo, Daniel, Hazan, Elad, Majumdar, Anirudha
We approach the fundamental problem of obstacle avoidance for robotic systems via the lens of online learning. In contrast to prior work that either assumes worst-case realizations of uncertainty in the environment or a stationary stochastic model of
Externí odkaz:
http://arxiv.org/abs/2306.08776
We investigate robust model-free reinforcement learning algorithms designed for environments that may be dynamic or even adversarial. Traditional state-based policies often struggle to accommodate the challenges imposed by the presence of unmodeled d
Externí odkaz:
http://arxiv.org/abs/2305.17552
Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) control are foundational and extensively researched problems in optimal control. We investigate LQR and LQG problems with semi-adversarial perturbations and time-varying adversarial
Externí odkaz:
http://arxiv.org/abs/2305.15352