Zobrazeno 1 - 10
of 8 447
pro vyhledávání: '"A. Strub"'
Autor:
Grinsztajn, Nathan, Flet-Berliac, Yannis, Azar, Mohammad Gheshlaghi, Strub, Florian, Wu, Bill, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Pietquin, Olivier, Geist, Matthieu
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a f
Externí odkaz:
http://arxiv.org/abs/2406.19188
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Autor:
Flet-Berliac, Yannis, Grinsztajn, Nathan, Strub, Florian, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Azar, Mohammad Gheshlaghi, Pietquin, Olivier, Geist, Matthieu
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more
Externí odkaz:
http://arxiv.org/abs/2406.19185
We present a novel machine learning framework tailored to detect massive black hole binaries observed by spaceborne gravitational wave detectors like the Laser Interferometer Space Antenna (LISA) and predict their future merger times. The detection i
Externí odkaz:
http://arxiv.org/abs/2405.11340
Autor:
Rita, Mathieu, Strub, Florian, Chaabouni, Rahma, Michel, Paul, Dupoux, Emmanuel, Pietquin, Olivier
While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyper
Externí odkaz:
http://arxiv.org/abs/2404.19409
Autor:
Strub, Stefan H., Ferraioli, Luigi, Schmelzbach, Cédric, Stähler, Simon C., Giardini, Domenico
The Laser Interferometer Space Antenna (LISA) is a planned space-based observatory to measure gravitational waves in the millihertz frequency band. This frequency band is expected to be dominated by signals from millions of Galactic binaries and tens
Externí odkaz:
http://arxiv.org/abs/2403.15318
Autor:
Rita, Mathieu, Michel, Paul, Chaabouni, Rahma, Pietquin, Olivier, Dupoux, Emmanuel, Strub, Florian
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several
Externí odkaz:
http://arxiv.org/abs/2403.11958
Autor:
A2 Collaboration, Afzal, F., Spieker, K., Hurck, P., Abt, S., Achenbach, P., Adlarson, P., Ahmed, Z., Akondi, C. S., Annand, J. R. M., Arends, H. J., Bashkanov, M., Beck, R., Biroth, M., Borisov, N., Braghieri, A., Briscoe, W. J., Cividini, F., Collicott, C., Costanza, S., Denig, A., Dieterle, M., Downie, E. J., Drexler, P., Fegan, S., Gardner, S., Ghosal, D., Glazier, D. I., Gorodnov, I., Gradl, W., Gurevich, D., Heijkenskjöld, L., Hornidge, D., Huber, G. M., Kashevarov, V. L., Kay, S. J. D., Korolija, M., Krusche, B., Lazarev, A., Livingston, K., Lutterer, S., MacGregor, I. J. D., Macrae, R. G., Manley, D. M., Martel, P. P., Miskimen, R., Mocanu, M., Mornacchi, E., Mullen, C., Neganov, A., Neiser, A., Oberle, M., Ostrick, M., Otte, P. B., Paudyal, D., Pedroni, P., Powell, A., Reicherz, G., Rostomyan, T., Sfienti, C., Sokhoyan, V., Steffen, O., Strakovsky, I. I., Strub, T., Supek, I., Thiel, A., Thiel, M., Thomas, A., Usov, Yu. A., Wagner, S., Walford, N. K., Watts, D. P., Werthmüller, D., Wettig, J., Witthauer, L., Wolfes, M., Zachariou, N.
We report the measurement of the helicity asymmetry $E$ for the $p\pi^0$ and $n\pi^+$ final states using, for the first time, an elliptically polarized photon beam in combination with a longitudinally polarized target at the Crystal Ball experiment a
Externí odkaz:
http://arxiv.org/abs/2402.05531
We study a discrete-time consumption-based capital asset pricing model under expectations-based reference-dependent preferences. More precisely, we consider an endowment economy populated by a representative agent who derives utility from current con
Externí odkaz:
http://arxiv.org/abs/2401.12856
Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon know
Externí odkaz:
http://arxiv.org/abs/2312.07551
We consider a new framework of predictable relative forward performance processes (PRFPP) to study portfolio management within a competitive environment. Each agent trades a distinct stock following a binomial distribution with probabilities for a po
Externí odkaz:
http://arxiv.org/abs/2311.04841