Výsledky vyhledávání

Report

Autor: Frick, Evan, Li, Tianle, Chen, Connor, Chiang, Wei-Lin, Angelopoulos, Anastasios N., Jiao, Jiantao, Zhu, Banghua, Gonzalez, Joseph E., Stoica, Ion

We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). The gold-standard approach is to run a full RLHF training pipeline and directly

Externí odkaz: http://arxiv.org/abs/2410.14872

Zobrazit plný text záznamu

Report

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Autor: Li, Tianle, Chiang, Wei-Lin, Frick, Evan, Dunlap, Lisa, Wu, Tianhao, Zhu, Banghua, Gonzalez, Joseph E., Stoica, Ion

The rapid evolution of Large Language Models (LLMs) has outpaced the development of model evaluation, highlighting the need for continuous curation of new, challenging benchmarks. However, manual curation of high-quality, human-aligned benchmarks is

Externí odkaz: http://arxiv.org/abs/2406.11939

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání