Zobrazeno 1 - 10
of 176
pro vyhledávání: '"Gromov, Andrey A."'
Autor:
Schaeffer, Rylan, Lecomte, Victor, Pai, Dhruv Bhandarkar, Carranza, Andres, Isik, Berivan, Unell, Alyssa, Khona, Mikail, Yerxa, Thomas, LeCun, Yann, Chung, SueYeon, Gromov, Andrey, Shwartz-Ziv, Ravid, Koyejo, Sanmi
Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL
Externí odkaz:
http://arxiv.org/abs/2406.09366
Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the
Externí odkaz:
http://arxiv.org/abs/2406.03495
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in
Externí odkaz:
http://arxiv.org/abs/2406.02550
Autor:
Gerstgrasser, Matthias, Schaeffer, Rylan, Dey, Apratim, Rafailov, Rafael, Sleight, Henry, Hughes, John, Korbak, Tomasz, Agrawal, Rajashree, Pai, Dhruv, Gromov, Andrey, Roberts, Daniel A., Yang, Diyi, Donoho, David L., Koyejo, Sanmi
The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed th
Externí odkaz:
http://arxiv.org/abs/2404.01413
We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers ar
Externí odkaz:
http://arxiv.org/abs/2403.17887
Autor:
Schaeffer, Rylan, Zahedi, Nika, Khona, Mikail, Pai, Dhruv, Truong, Sang, Du, Yilun, Ostrow, Mitchell, Chandra, Sarthak, Carranza, Andres, Fiete, Ila Rani, Gromov, Andrey, Koyejo, Sanmi
Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from pr
Externí odkaz:
http://arxiv.org/abs/2402.10202
Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the unde
Externí odkaz:
http://arxiv.org/abs/2310.13061
Publikováno v:
Phys. Rev. Lett. 130, 176501 (2023)
The Moore-Read state, one of the leading candidates for describing the fractional quantum Hall effect at filling factor $\nu{=}5/2$, is a paradigmatic $p$-wave superconductor with non-Abelian topological order. Among its many exotic properties, the s
Externí odkaz:
http://arxiv.org/abs/2301.04169
Autor:
Gromov, Andrey
We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as ``grokking''. Concretely, we present (i) fully-connected two-layer networks that exhibit grokking on various modular arit
Externí odkaz:
http://arxiv.org/abs/2301.02679
Publikováno v:
Phys. Rev. B 107, 125119 (2023)
Supersymmetry and supergravity were invented in the 1970s to solve fundamental problems in high-energy physics. Even though neither of these ideas has yet been confirmed in high-energy and cosmology experiments, they have been beneficial in construct
Externí odkaz:
http://arxiv.org/abs/2212.00686