Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Nath, Vaskar"'
The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training, while post-training is about giving a model the right style and format. We re-examine these claims by empirical
Externí odkaz:
http://arxiv.org/abs/2410.03717
Autor:
Wang, Evan, Cassano, Federico, Wu, Catherine, Bai, Yunfeng, Song, Will, Nath, Vaskar, Han, Ziwen, Hendryx, Sean, Yue, Summer, Zhang, Hugh
While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading t
Externí odkaz:
http://arxiv.org/abs/2409.03733
Autor:
Nath, Vaskar, Slack, Dylan, Da, Jeff, Ma, Yuntao, Zhang, Hugh, Whitehead, Spencer, Hendryx, Sean
Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinf
Externí odkaz:
http://arxiv.org/abs/2407.13887