Zobrazeno 1 - 10
of 61
pro vyhledávání: '"Bhatia, Kush"'
Autor:
Sarukkai, Vishnu, Shacklett, Brennan, Majercik, Zander, Bhatia, Kush, Ré, Christopher, Fatahalian, Kayvon
Large Language Models (LLMs) have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. This p
Externí odkaz:
http://arxiv.org/abs/2410.09187
Fine-tuning large language models (LLMs) on instruction datasets is a common way to improve their generative capabilities. However, instruction datasets can be expensive and time-consuming to manually curate, and while LLM-generated data is less labo
Externí odkaz:
http://arxiv.org/abs/2410.05224
Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training linear Transformers from scratch, (2) "finetuned-convers
Externí odkaz:
http://arxiv.org/abs/2402.04347
Autor:
Mukobi, Gabriel, Chatain, Peter, Fong, Su, Windesheim, Robert, Kutyniok, Gitta, Bhatia, Kush, Alberti, Silas
While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these models, Supervi
Externí odkaz:
http://arxiv.org/abs/2310.16763
Autor:
Chen, Mayee F., Roberts, Nicholas, Bhatia, Kush, Wang, Jue, Zhang, Ce, Sala, Frederic, Ré, Christopher
The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework
Externí odkaz:
http://arxiv.org/abs/2307.14430
Autor:
Guha, Neel, Chen, Mayee F., Bhatia, Kush, Mirhoseini, Azalia, Sala, Frederic, Ré, Christopher
Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, i
Externí odkaz:
http://arxiv.org/abs/2307.11031
Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying mod
Externí odkaz:
http://arxiv.org/abs/2306.07536
Specifying reward functions for complex tasks like object manipulation or driving is challenging to do by hand. Reward learning seeks to address this by learning a reward model using human feedback on selected query policies. This shifts the burden o
Externí odkaz:
http://arxiv.org/abs/2302.12349
For traffic routing platforms, the choice of which route to recommend to a user depends on the congestion on these routes -- indeed, an individual's utility depends on the number of people using the recommended route at that instance. Motivated by th
Externí odkaz:
http://arxiv.org/abs/2301.09251
Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want. But doing so relies on models of how humans behave given their objectives. After decades of research in c
Externí odkaz:
http://arxiv.org/abs/2212.04717