Výsledky vyhledávání

Report

Averaging log-likelihoods in direct alignment

Autor: Grinsztajn, Nathan, Flet-Berliac, Yannis, Azar, Mohammad Gheshlaghi, Strub, Florian, Wu, Bill, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Pietquin, Olivier, Geist, Matthieu

To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a f

Externí odkaz: http://arxiv.org/abs/2406.19188

Zobrazit plný text záznamu

Report

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Autor: Flet-Berliac, Yannis, Grinsztajn, Nathan, Strub, Florian, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Azar, Mohammad Gheshlaghi, Pietquin, Olivier, Geist, Matthieu

Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more

Externí odkaz: http://arxiv.org/abs/2406.19185

Zobrazit plný text záznamu

Report

Detection and Prediction of Future Massive Black Hole Mergers with Machine Learning and Truncated Waveforms

Autor: Houba, Niklas, Strub, Stefan H., Ferraioli, Luigi, Giardini, Domenico

We present a novel machine learning framework tailored to detect massive black hole binaries observed by spaceborne gravitational wave detectors like the Laser Interferometer Space Antenna (LISA) and predict their future merger times. The detection i

Externí odkaz: http://arxiv.org/abs/2405.11340

Zobrazit plný text záznamu

Report

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Autor: Rita, Mathieu, Strub, Florian, Chaabouni, Rahma, Michel, Paul, Dupoux, Emmanuel, Pietquin, Olivier

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyper

Externí odkaz: http://arxiv.org/abs/2404.19409

Zobrazit plný text záznamu

Report

Global Analysis of LISA Data with Galactic Binaries and Massive Black Hole Binaries

Autor: Strub, Stefan H., Ferraioli, Luigi, Schmelzbach, Cédric, Stähler, Simon C., Giardini, Domenico

The Laser Interferometer Space Antenna (LISA) is a planned space-based observatory to measure gravitational waves in the millihertz frequency band. This frequency band is expected to be dominated by signals from millions of Galactic binaries and tens

Externí odkaz: http://arxiv.org/abs/2403.15318

Zobrazit plný text záznamu

Report

Language Evolution with Deep Learning

Autor: Rita, Mathieu, Michel, Paul, Chaabouni, Rahma, Pietquin, Olivier, Dupoux, Emmanuel, Strub, Florian

Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several

Externí odkaz: http://arxiv.org/abs/2403.11958

Zobrazit plný text záznamu

Report

First measurement using elliptically polarized photons of the double-polarization observable $E$ for $\gamma p \to p \pi^0$ and $\gamma p \to n \pi^+$

Autor: A2 Collaboration, Afzal, F., Spieker, K., Hurck, P., Abt, S., Achenbach, P., Adlarson, P., Ahmed, Z., Akondi, C. S., Annand, J. R. M., Arends, H. J., Bashkanov, M., Beck, R., Biroth, M., Borisov, N., Braghieri, A., Briscoe, W. J., Cividini, F., Collicott, C., Costanza, S., Denig, A., Dieterle, M., Downie, E. J., Drexler, P., Fegan, S., Gardner, S., Ghosal, D., Glazier, D. I., Gorodnov, I., Gradl, W., Gurevich, D., Heijkenskjöld, L., Hornidge, D., Huber, G. M., Kashevarov, V. L., Kay, S. J. D., Korolija, M., Krusche, B., Lazarev, A., Livingston, K., Lutterer, S., MacGregor, I. J. D., Macrae, R. G., Manley, D. M., Martel, P. P., Miskimen, R., Mocanu, M., Mornacchi, E., Mullen, C., Neganov, A., Neiser, A., Oberle, M., Ostrick, M., Otte, P. B., Paudyal, D., Pedroni, P., Powell, A., Reicherz, G., Rostomyan, T., Sfienti, C., Sokhoyan, V., Steffen, O., Strakovsky, I. I., Strub, T., Supek, I., Thiel, A., Thiel, M., Thomas, A., Usov, Yu. A., Wagner, S., Walford, N. K., Watts, D. P., Werthmüller, D., Wettig, J., Witthauer, L., Wolfes, M., Zachariou, N.

We report the measurement of the helicity asymmetry $E$ for the $p\pi^0$ and $n\pi^+$ final states using, for the first time, an elliptically polarized photon beam in combination with a longitudinally polarized target at the Crystal Ball experiment a

Externí odkaz: http://arxiv.org/abs/2402.05531

Zobrazit plný text záznamu

Report

Reference-dependent asset pricing with a stochastic consumption-dividend ratio

Autor: Aquino, Luca De Gennaro, He, Xuedong, Strub, Moris Simon, Yang, Yuting

We study a discrete-time consumption-based capital asset pricing model under expectations-based reference-dependent preferences. More precisely, we consider an endowment economy populated by a representative agent who derives utility from current con

Externí odkaz: http://arxiv.org/abs/2401.12856

Zobrazit plný text záznamu

Report

Language Model Alignment with Elastic Reset

Autor: Noukhovitch, Michael, Lavoie, Samuel, Strub, Florian, Courville, Aaron

Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon know

Externí odkaz: http://arxiv.org/abs/2312.07551

Zobrazit plný text záznamu

Report

Predictable Relative Forward Performance Processes: Multi-Agent and Mean Field Games for Portfolio Management

Autor: Liang, Gechun, Strub, Moris S., Wang, Yuwei

We consider a new framework of predictable relative forward performance processes (PRFPP) to study portfolio management within a competitive environment. Each agent trades a distinct stock following a binomial distribution with probabilities for a po

Externí odkaz: http://arxiv.org/abs/2311.04841

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání