Výsledky vyhledávání

Report

Resolving Discrepancies in Compute-Optimal Scaling of Language Models

Autor: Porian, Tomer, Wortsman, Mitchell, Jitsev, Jenia, Schmidt, Ludwig, Carmon, Yair

Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scalin

Externí odkaz: http://arxiv.org/abs/2406.19146

Zobrazit plný text záznamu

Report

On Ginzburg-Kaplan gamma factors and Bessel-Speh functions for finite general linear groups

Autor: Carmon, Oded, Zelingher, Elad

We give a new construction of tensor product gamma factors for a pair of irreducible representations of $\operatorname{GL}_c\left(\mathbb{F}_q\right)$ and $\operatorname{GL}_k\left(\mathbb{F}_q\right)$. This construction is a finite field analog of a

Externí odkaz: http://arxiv.org/abs/2406.14262

Zobrazit plný text záznamu

Report

DataComp-LM: In search of the next generation of training sets for language models

Autor: Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, Shankar, Vaishaal

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretrai

Externí odkaz: http://arxiv.org/abs/2406.11794

Zobrazit plný text záznamu

Report

Faster $(\Delta + 1)$-Edge Coloring: Breaking the $m \sqrt{n}$ Time Barrier

Autor: Bhattacharya, Sayan, Carmon, Din, Costa, Martín, Solomon, Shay, Zhang, Tianyi

Vizing's theorem states that any $n$-vertex $m$-edge graph of maximum degree $\Delta$ can be {\em edge colored} using at most $\Delta + 1$ different colors [Diskret.~Analiz, '64]. Vizing's original proof is algorithmic and shows that such an edge col

Externí odkaz: http://arxiv.org/abs/2405.15449

Zobrazit plný text záznamu

Report

Accelerated Parameter-Free Stochastic Optimization

Autor: Kreisler, Itai, Ivgi, Maor, Hinder, Oliver, Carmon, Yair

We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to opti

Externí odkaz: http://arxiv.org/abs/2404.00666

Zobrazit plný text záznamu

Report

Language models scale reliably with over-training and on downstream tasks

Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimatel

Externí odkaz: http://arxiv.org/abs/2403.08540

Zobrazit plný text záznamu

Report

The Price of Adaptivity in Stochastic Convex Optimization

Autor: Carmon, Yair, Hinder, Oliver

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in

Externí odkaz: http://arxiv.org/abs/2402.10898

Zobrazit plný text záznamu

Report

Radiation Pressure Induced Oscillations of an Optically Levitating Mirror

Autor: Jha, Satyam Shekhar, Carmon, Tal, Cheng, Fan, Deych, Lev

Optical Fabry-Perot cavity with a movable mirror is a paradigmatic optomechanical systems. While usually the mirror is supported by a mechanical spring, it has been shown that it is possible to keep one of the mirrors in a stable equilibrium purely b

Externí odkaz: http://arxiv.org/abs/2401.00954

Zobrazit plný text záznamu

Report

Cavity Continuum

Autor: Cheng, Fan, Shuvayev, Vladimir, Douvidzon, Mark, Deych, Lev, Carmon, Tal

We experimentally demonstrate and numerically analyze large arrays of whispering gallery resonators. Using fluorescent mapping, we measure the spatial distribution of the cavity-ensemble's resonances, revealing that light reaches distant resonators i

Externí odkaz: http://arxiv.org/abs/2312.12632

Zobrazit plný text záznamu

Report

A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

Autor: Carmon, Yair, Jambulapati, Arun, Jin, Yujia, Sidford, Aaron

We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $\epsilon$-approximate solution using $\widetilde{O}(n \epsilon^{-

Externí odkaz: http://arxiv.org/abs/2311.10886

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání