Zobrazeno 1 - 10
of 8 916
pro vyhledávání: '"Wilson Andrew"'
Autor:
Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion
Externí odkaz:
http://arxiv.org/abs/2407.18158
Autor:
Shwartz-Ziv, Ravid, Goldblum, Micah, Bansal, Arpit, Bruss, C. Bayan, LeCun, Yann, Wilson, Andrew Gordon
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessi
Externí odkaz:
http://arxiv.org/abs/2406.11463
To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accura
Externí odkaz:
http://arxiv.org/abs/2406.09177
Autor:
Kapoor, Sanyam, Gruver, Nate, Roberts, Manley, Collins, Katherine, Pal, Arka, Bhatt, Umang, Weller, Adrian, Dooley, Samuel, Goldblum, Micah, Wilson, Andrew Gordon
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce
Externí odkaz:
http://arxiv.org/abs/2406.08391
Autor:
Qiu, Shikai, Han, Boran, Maddix, Danielle C., Zhang, Shuai, Wang, Yuyang, Wilson, Andrew Gordon
How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited in
Externí odkaz:
http://arxiv.org/abs/2406.07337
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolut
Externí odkaz:
http://arxiv.org/abs/2406.06248
Autor:
Lavoie, Samuel, Kirichenko, Polina, Ibrahim, Mark, Assran, Mahmoud, Wilson, Andrew Gordon, Courville, Aaron, Ballas, Nicolas
There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an
Externí odkaz:
http://arxiv.org/abs/2405.00740
Autor:
Souri, Hossein, Bansal, Arpit, Kazemi, Hamid, Fowl, Liam, Saha, Aniruddha, Geiping, Jonas, Wilson, Andrew Gordon, Chellappa, Rama, Goldstein, Tom, Goldblum, Micah
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to th
Externí odkaz:
http://arxiv.org/abs/2403.16365
Autor:
Rajaram, Shwetha, Numan, Nels, Kumaravel, Balasaravanan Thoravi, Marquardt, Nicolai, Wilson, Andrew D.
Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization,
Externí odkaz:
http://arxiv.org/abs/2403.13947
Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this
Externí odkaz:
http://arxiv.org/abs/2403.09869