Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Paul, Mansheej"'
Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the cap
Externí odkaz:
http://arxiv.org/abs/2408.11791
Pretraining datasets for large language models (LLMs) have grown to trillions of tokens composed of large amounts of CommonCrawl (CC) web scrape along with smaller, domain-specific datasets. It is expensive to understand the impact of these domain-sp
Externí odkaz:
http://arxiv.org/abs/2406.03476
Autor:
Ankner, Zachary, Blakeney, Cody, Sreenivasan, Kartik, Marion, Max, Leavitt, Matthew L., Paul, Mansheej
In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a
Externí odkaz:
http://arxiv.org/abs/2405.20541
Autor:
Biderman, Dan, Portes, Jacob, Ortiz, Jose Javier Gonzalez, Paul, Mansheej, Greengard, Philip, Jennings, Connor, King, Daniel, Havens, Sam, Chiley, Vitaliy, Frankle, Jonathan, Blakeney, Cody, Cunningham, John P.
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and f
Externí odkaz:
http://arxiv.org/abs/2405.09673
Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises a foundational question: can ICL solve fundamentally $
Externí odkaz:
http://arxiv.org/abs/2306.15063
Autor:
Paul, Mansheej, Chen, Feng, Larsen, Brett W., Frankle, Jonathan, Ganguli, Surya, Dziugaite, Gintare Karolina
Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is
Externí odkaz:
http://arxiv.org/abs/2210.03044
Autor:
Paul, Mansheej, Larsen, Brett W., Ganguli, Surya, Frankle, Jonathan, Dziugaite, Gintare Karolina
A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same
Externí odkaz:
http://arxiv.org/abs/2206.01278
Publikováno v:
Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Recent success in deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization
Externí odkaz:
http://arxiv.org/abs/2107.07075
Autor:
Fort, Stanislav, Dziugaite, Gintare Karolina, Paul, Mansheej, Kharaghani, Sepideh, Roy, Daniel M., Ganguli, Surya
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. S
Externí odkaz:
http://arxiv.org/abs/2010.15110