Výsledky vyhledávání - "Knox, Bradley"

Report

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Autor: Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represen

Externí odkaz: http://arxiv.org/abs/2406.02900

Zobrazit plný text záznamu

Akademický článek

From Rectangular Bands to κ-Primal Algebras.

Autor: Davey, Brian A., Knox, Bradley J.

Publikováno v: Semigroup Forum. 2002, Vol. 64 Issue 1, p29. 26p.

Zobrazit plný text záznamu

Computational, Neuroscientific, and Lifespan Perspectives on the Exploration-Exploitation Dilemma

Autor: Otto, A. Ross, Knox, Bradley, Love, Bradley, Gershman, Sam, Niv, Yael, Worthy, Darrell, Maddox, Todd, Hotaling, Jared, Busemeyer, Jerome, Shiffrin, Richard

Publikováno v: Otto, A. Ross; Knox, Bradley; Love, Bradley; Gershman, Sam; Niv, Yael; Worthy, Darrell; et al.(2011). Computational, Neuroscientific, and Lifespan Perspectives on the Exploration-Exploitation Dilemma. Proceedings of the Cognitive Science Society, 33(33). Retrieved from: http://www.escholarship.org/uc/item/6k90f413

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=od_______325::f8a511672ab247e7bfb079d6858c2bfb
http://www.escholarship.org/uc/item/6k90f413

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání