Výsledky vyhledávání

Report

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Autor: Zhang, Natalia, Wang, Xinqi, Cui, Qiwen, Zhou, Runlong, Kakade, Sham M., Du, Simon S.

We initiate the study of Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in g

Externí odkaz: http://arxiv.org/abs/2409.00717

Zobrazit plný text záznamu

Report

Deconstructing What Makes a Good Optimizer for Language Models

Autor: Zhao, Rosie, Morwani, Depen, Brandfonbrener, David, Vyas, Nikhil, Kakade, Sham

Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most e

Externí odkaz: http://arxiv.org/abs/2407.07972

Zobrazit plný text záznamu

Report

Universal Length Generalization with Turing Programs

Autor: Hou, Kaiying, Brandfonbrener, David, Kakade, Sham, Jelassi, Samy, Malach, Eran

Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve le

Externí odkaz: http://arxiv.org/abs/2407.03310

Zobrazit plný text záznamu

Report

Eliminating Position Bias of Language Models: A Mechanistic Approach

Autor: Wang, Ziqi, Zhang, Hanlin, Li, Xiner, Huang, Kuan-Hao, Han, Chi, Ji, Shuiwang, Kakade, Sham M., Peng, Hao, Ji, Heng

Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness

Externí odkaz: http://arxiv.org/abs/2407.01100

Zobrazit plný text záznamu

Report

A New Perspective on Shampoo's Preconditioner

Autor: Morwani, Depen, Shapira, Itai, Vyas, Nikhil, Malach, Eran, Kakade, Sham, Janson, Lucas

Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation

Externí odkaz: http://arxiv.org/abs/2406.17748

Zobrazit plný text záznamu

Report

DataComp-LM: In search of the next generation of training sets for language models

Autor: Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, Shankar, Vaishaal

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretrai

Externí odkaz: http://arxiv.org/abs/2406.11794

Zobrazit plný text záznamu

Report

Transcendence: Generative Models Can Outperform The Experts That Train Them

Autor: Zhang, Edwin, Zhu, Vincent, Saphra, Naomi, Kleiman, Anat, Edelman, Benjamin L., Tambe, Milind, Kakade, Sham M., Malach, Eran

Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outp

Externí odkaz: http://arxiv.org/abs/2406.11741

Zobrazit plný text záznamu

Report

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

Autor: Brandfonbrener, David, Zhang, Hanlin, Kirsch, Andreas, Schwarz, Jonathan Richard, Kakade, Sham

Selecting high-quality data for pre-training is crucial in shaping the downstream task performance of language models. A major challenge lies in identifying this optimal subset, a problem generally considered intractable, thus necessitating scalable

Externí odkaz: http://arxiv.org/abs/2406.10670

Zobrazit plný text záznamu

Report

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Autor: Lin, Licong, Wu, Jingfeng, Kakade, Sham M., Bartlett, Peter L., Lee, Jason D.

Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approxi

Externí odkaz: http://arxiv.org/abs/2406.08466

Zobrazit plný text záznamu

Report

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Autor: Shen, Ethan, Fan, Alan, Pratt, Sarah M., Park, Jae Sung, Wallingford, Matthew, Kakade, Sham M., Holtzman, Ari, Krishna, Ranjay, Farhadi, Ali, Kusupati, Aditya

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autor

Externí odkaz: http://arxiv.org/abs/2405.18400

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání