Výsledky vyhledávání

Report

Time transformation between the solar system barycenter and the surfaces of the Earth and Moon

Autor: Turyshev, Slava G., Williams, James G., Boggs, Dale H., Park, Ryan S.

The transformation of time between the surface of the Earth, the solar system barycenter, and the surface of the Moon involves relativistic corrections. For solar system Barycentric Dynamical Time (TDB), we also require that there be no rate differen

Externí odkaz: http://arxiv.org/abs/2406.16147

Zobrazit plný text záznamu

Report

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Autor: Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represen

Externí odkaz: http://arxiv.org/abs/2406.02900

Zobrazit plný text záznamu

Report

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Autor: Rafailov, Rafael, Hejna, Joey, Park, Ryan, Finn, Chelsea

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preferen

Externí odkaz: http://arxiv.org/abs/2404.12358

Zobrazit plný text záznamu

Report

Smooth Information Gathering in Two-Player Noncooperative Games

Autor: Palafox, Fernando, Milzman, Jesse, Lee, Dong Ho, Park, Ryan, Fridovich-Keil, David

We present a mathematical framework for modeling two-player noncooperative games in which one player (the defender) is uncertain of the costs of the game and the second player's (the attacker's) intention but can preemptively allocate information-gat

Externí odkaz: http://arxiv.org/abs/2404.00733

Zobrazit plný text záznamu

Report

Disentangling Length from Quality in Direct Preference Optimization

Autor: Park, Ryan, Rafailov, Rafael, Ermon, Stefano, Finn, Chelsea

Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is of

Externí odkaz: http://arxiv.org/abs/2403.19159

Zobrazit plný text záznamu

Report

Strong resemblance between surface and deep zonal winds inside Jupiter revealed by high-degree gravity moments

Autor: Cao, Hao, Bloxham, Jeremy, Park, Ryan S., Militzer, Burkhard, Yadav, Rakesh K., Kulowski, Laura, Stevenson, David J., Bolton, Scott J.

Publikováno v: ApJ 959 78 (2023)

Jupiter's atmosphere-interior is a coupled fluid dynamical system strongly influenced by the rapid background rotation. While the visible atmosphere features east-west zonal winds on the order of 100 m/s (Tollefson et al. 2017), zonal flows in the dy

Externí odkaz: http://arxiv.org/abs/2311.11494

Zobrazit plný text záznamu

Report

Preference Optimization for Molecular Language Models

Autor: Park, Ryan, Theisen, Ryan, Sahni, Navriti, Patek, Marcel, Cichońska, Anna, Rahman, Rayees

Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference

Externí odkaz: http://arxiv.org/abs/2310.12304

Zobrazit plný text záznamu

Report

The Hera Radio Science Experiment at Didymos

Autor: Gramigna, Edoardo, Manghi, Riccardo Lasagni, Zannoni, Marco, Tortora, Paolo, Park, Ryan S., Tommei, Giacomo, Maistre, Sébastien Le, Michel, Patrick, Castellini, Francesco, Kueppers, Michael

Publikováno v: Planetary and Space Science, 2024, 105906

Hera represents the European Space Agency's inaugural planetary defense space mission and plays a pivotal role in the Asteroid Impact and Deflection Assessment international collaboration with NASA DART mission that performed the first asteroid defle

Externí odkaz: http://arxiv.org/abs/2310.11883

Zobrazit plný text záznamu

Report

ThunderBoltz: An Open-Source DSMC-based Boltzmann Solver for Plasma Transport, Chemical Kinetics, and 0D Plasma Modeling

Autor: Park, Ryan, Scheiner, Brett S., Zammit, Mark C.

Plasma-neutral interactions, including reactive kinetics, are often either studied in 0D using ODE based descriptions, or in multi-dimensional fluid or particle based plasma codes. The latter case involves a complex assembly of procedures that are no

Externí odkaz: http://arxiv.org/abs/2310.07913

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání