Zobrazeno 1 - 10
of 42 476
pro vyhledávání: '"Azar, A."'
Publikováno v:
Trans. Theor. Math. Phys. (TTMP), vol 1(4), 2024
In solving the Brans-Dicke (BD) equations in the BD theory of gravity, their linear independence is important. This is due to fact that in solving these equations in cosmology, if the number of unknown quantities is equal to the number of independent
Externí odkaz:
http://arxiv.org/abs/2410.13316
Autor:
Azar, Eyar, Nadler, Boaz
The premise of semi-supervised learning (SSL) is that combining labeled and unlabeled data yields significantly more accurate models. Despite empirical successes, the theoretical understanding of SSL is still far from complete. In this work, we study
Externí odkaz:
http://arxiv.org/abs/2409.03335
Cr\'epey, Frikha, and Louzi (2023) introduced a multilevel stochastic approximation scheme to compute the value-at-risk of a financial loss that is only simulatable by Monte Carlo. The optimal complexity of the scheme is in $O({\varepsilon}^{-5/2})$,
Externí odkaz:
http://arxiv.org/abs/2408.06531
In this short article, using a left-invariant Randers metric $F$, we define a new left-invariant Randers metric $\tilde{F}$. We show that $F$ is of Berwald (Douglas) type if and only if $\tilde{F}$ is of Berwald (Douglas) type. In the case of Berwald
Externí odkaz:
http://arxiv.org/abs/2407.21044
Autor:
Grinsztajn, Nathan, Flet-Berliac, Yannis, Azar, Mohammad Gheshlaghi, Strub, Florian, Wu, Bill, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Pietquin, Olivier, Geist, Matthieu
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a f
Externí odkaz:
http://arxiv.org/abs/2406.19188
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Autor:
Flet-Berliac, Yannis, Grinsztajn, Nathan, Strub, Florian, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Azar, Mohammad Gheshlaghi, Pietquin, Olivier, Geist, Matthieu
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more
Externí odkaz:
http://arxiv.org/abs/2406.19185
Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-de
Externí odkaz:
http://arxiv.org/abs/2406.01660
Autor:
Richemond, Pierre Harvey, Tang, Yunhao, Guo, Daniel, Calandriello, Daniele, Azar, Mohammad Gheshlaghi, Rafailov, Rafael, Pires, Bernardo Avila, Tarassov, Eugene, Spangher, Lucas, Ellsworth, Will, Severyn, Aliaksei, Mallinson, Jonathan, Shani, Lior, Shamir, Gil, Joshi, Rishabh, Liu, Tianqi, Munos, Remi, Piot, Bilal
The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is
Externí odkaz:
http://arxiv.org/abs/2405.19107
Autor:
Dowling, Neil, West, Maxwell T., Southwell, Angus, Nakhl, Azar C., Sevior, Martin, Usman, Muhammad, Modi, Kavan
Despite their ever more widespread deployment throughout society, machine learning algorithms remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The prospect of near-term quantum computers being capab
Externí odkaz:
http://arxiv.org/abs/2405.10360
In this paper we consider an inflating universe with long straight cosmic string along z-axis. We show that the effect of cosmic string can be taken as a perturbation on the background of FRW metric. Then by doing cosmological perturbations on this i
Externí odkaz:
http://arxiv.org/abs/2405.02470