Zobrazeno 1 - 10
of 979
pro vyhledávání: '"Carmon P"'
Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scalin
Externí odkaz:
http://arxiv.org/abs/2406.19146
Autor:
Carmon, Oded, Zelingher, Elad
We give a new construction of tensor product gamma factors for a pair of irreducible representations of $\operatorname{GL}_c\left(\mathbb{F}_q\right)$ and $\operatorname{GL}_k\left(\mathbb{F}_q\right)$. This construction is a finite field analog of a
Externí odkaz:
http://arxiv.org/abs/2406.14262
Autor:
Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, Shankar, Vaishaal
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretrai
Externí odkaz:
http://arxiv.org/abs/2406.11794
Vizing's theorem states that any $n$-vertex $m$-edge graph of maximum degree $\Delta$ can be {\em edge colored} using at most $\Delta + 1$ different colors [Diskret.~Analiz, '64]. Vizing's original proof is algorithmic and shows that such an edge col
Externí odkaz:
http://arxiv.org/abs/2405.15449
We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to opti
Externí odkaz:
http://arxiv.org/abs/2404.00666
Autor:
Gadre, Samir Yitzhak, Smyrnis, Georgios, Shankar, Vaishaal, Gururangan, Suchin, Wortsman, Mitchell, Shao, Rulin, Mercat, Jean, Fang, Alex, Li, Jeffrey, Keh, Sedrick, Xin, Rui, Nezhurina, Marianna, Vasiljevic, Igor, Jitsev, Jenia, Soldaini, Luca, Dimakis, Alexandros G., Ilharco, Gabriel, Koh, Pang Wei, Song, Shuran, Kollar, Thomas, Carmon, Yair, Dave, Achal, Heckel, Reinhard, Muennighoff, Niklas, Schmidt, Ludwig
Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimatel
Externí odkaz:
http://arxiv.org/abs/2403.08540
Autor:
Carmon, Yair, Hinder, Oliver
We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in
Externí odkaz:
http://arxiv.org/abs/2402.10898
Optical Fabry-Perot cavity with a movable mirror is a paradigmatic optomechanical systems. While usually the mirror is supported by a mechanical spring, it has been shown that it is possible to keep one of the mirrors in a stable equilibrium purely b
Externí odkaz:
http://arxiv.org/abs/2401.00954
We experimentally demonstrate and numerically analyze large arrays of whispering gallery resonators. Using fluorescent mapping, we measure the spatial distribution of the cavity-ensemble's resonances, revealing that light reaches distant resonators i
Externí odkaz:
http://arxiv.org/abs/2312.12632
We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $\epsilon$-approximate solution using $\widetilde{O}(n \epsilon^{-
Externí odkaz:
http://arxiv.org/abs/2311.10886