Zobrazeno 1 - 10
of 548
pro vyhledávání: '"Belkin, Mikhail A."'
Transformers exhibit In-Context Learning (ICL), where these models solve new tasks by using examples in the prompt without additional training. In our work, we identify and analyze two key components of ICL: (1) context-scaling, where model performan
Externí odkaz:
http://arxiv.org/abs/2410.12783
We study the Laplacian of the undirected De Bruijn graph over an alphabet $A$ of order $k$. While the eigenvalues of this Laplacian were found in 1998 by Delorme and Tillich [1], an explicit description of its eigenvectors has remained elusive. In th
Externí odkaz:
http://arxiv.org/abs/2410.07622
Autor:
Stich, Simon, Mohajan, Jewel, de Ceglia, Domenico, Carletti, Luca, Jung, Hyunseung, Karl, Nicholas, Brener, Igal, Rodriguez, Alejandro W., Belkin, Mikhail A., Sarma, Raktim
Nonlinear metasurfaces offer a new paradigm to realize optical nonlinear devices with new and unparalleled behavior compared to nonlinear crystals, due to the interplay between photonic resonances and materials properties. The complicated interdepend
Externí odkaz:
http://arxiv.org/abs/2409.18196
Autor:
Mallinar, Neil, Beaglehole, Daniel, Zhu, Libin, Radhakrishnan, Adityanarayanan, Pandit, Parthe, Belkin, Mikhail
Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of "emerge
Externí odkaz:
http://arxiv.org/abs/2407.20199
Autor:
Cotrufo, Michele, Krakofsky, Jonas, Mann, Sander A., Böhm, Gerhard, Belkin, Mikhail A., Alù, Andrea
Nonlinear intersubband polaritonic metasurfaces support one of the strongest known ultrafast nonlinear responses in the mid-infrared frequency range across all condensed matter systems. Beyond harmonic generation and frequency mixing, these nonlinear
Externí odkaz:
http://arxiv.org/abs/2403.15911
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explain
Externí odkaz:
http://arxiv.org/abs/2402.13728
Mitigating the retention of sensitive or private information in large language models is essential for enhancing privacy and safety. Existing unlearning methods, like Gradient Ascent and Negative Preference Optimization, directly tune models to remov
Externí odkaz:
http://arxiv.org/abs/2402.10052
A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly
Externí odkaz:
http://arxiv.org/abs/2401.04553
Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an
Externí odkaz:
http://arxiv.org/abs/2312.03311
In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training
Externí odkaz:
http://arxiv.org/abs/2311.14646