Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Achab, Mastane"'
Autor:
Achab, Mastane
We present a generalization of the proximal operator defined through a convex combination of convex objectives, where the coefficients are updated in a minimax fashion. We prove that this new operator is Bregman firmly nonexpansive with respect to a
Externí odkaz:
http://arxiv.org/abs/2411.00928
This paper explores the effects of various forms of regularization in the context of language model alignment via self-play. While both reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) require to collect cost
Externí odkaz:
http://arxiv.org/abs/2404.04291
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no l
Externí odkaz:
http://arxiv.org/abs/2310.19821
In today's era, autonomous vehicles demand a safety level on par with aircraft. Taking a cue from the aerospace industry, which relies on redundancy to achieve high reliability, the automotive sector can also leverage this concept by building redunda
Externí odkaz:
http://arxiv.org/abs/2310.03767
Autor:
Achab, Mastane
This paper extends the classic theory of convex optimization to the minimization of functions that are equal to the negated logarithm of what we term as a sum-log-concave function, i.e., a sum of log-concave functions. In particular, we show that suc
Externí odkaz:
http://arxiv.org/abs/2309.15298
In this paper, we propose a nested matrix-tensor model which extends the spiked rank-one tensor model of order three. This model is particularly motivated by a multi-view clustering problem in which multiple noisy observations of each data point are
Externí odkaz:
http://arxiv.org/abs/2305.19992
Autor:
Achab, Mastane, Alami, Reda, Djilali, Yasser Abdelaziz Dahou, Fedyanin, Kirill, Moulines, Eric
Reinforcement learning (RL) allows an agent interacting sequentially with an environment to maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the agent goes beyond the limit of the expected value, to capture the und
Externí odkaz:
http://arxiv.org/abs/2304.14421
Autor:
Achab, Mastane, Neu, Gergely
In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distrib
Externí odkaz:
http://arxiv.org/abs/2112.15430
We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution) but is sti
Externí odkaz:
http://arxiv.org/abs/2002.05145
Whereas most dimensionality reduction techniques (e.g. PCA, ICA, NMF) for multivariate data essentially rely on linear algebra to a certain extent, summarizing ranking data, viewed as realizations of a random permutation $\Sigma$ on a set of items in
Externí odkaz:
http://arxiv.org/abs/1810.06291