Výsledky vyhledávání - "Xiao, Lechao"

Report

Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling

Autor: Xiao, Lechao

The remarkable success of large language pretraining and the discovery of scaling laws signify a paradigm shift in machine learning. Notably, the primary objective has evolved from minimizing generalization error to reducing approximation error, and

Externí odkaz: http://arxiv.org/abs/2409.15156

Zobrazit plný text záznamu

Report

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

While many capabilities of language models (LMs) improve with increased training budget, the influence of scale on hallucinations is not yet fully understood. Hallucinations come in many forms, and there is no universally accepted definition. We thus

Externí odkaz: http://arxiv.org/abs/2408.07852

Zobrazit plný text záznamu

Report

Scaling Exponents Across Parameterizations and Optimizers

Autor: Everett, Katie, Xiao, Lechao, Wortsman, Mitchell, Alemi, Alexander A., Novak, Roman, Liu, Peter J., Gur, Izzeddin, Sohl-Dickstein, Jascha, Kaelbling, Leslie Pack, Lee, Jaehoon, Pennington, Jeffrey

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on pa

Externí odkaz: http://arxiv.org/abs/2407.05872

Zobrazit plný text záznamu

Report

4+3 Phases of Compute-Optimal Neural Scaling Laws

Autor: Paquette, Elliot, Paquette, Courtney, Xiao, Lechao, Pennington, Jeffrey

We consider the three parameter solvable neural scaling model introduced by Maloney, Roberts, and Sully. The model has three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new pre

Externí odkaz: http://arxiv.org/abs/2405.15074

Zobrazit plný text záznamu

Report

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go bey

Externí odkaz: http://arxiv.org/abs/2312.06585

Zobrazit plný text záznamu

Report

Frontier Language Models are not Robust to Adversarial Arithmetic, or 'What do I need to say so you agree 2+2=5?

We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial str

Externí odkaz: http://arxiv.org/abs/2311.07587

Zobrazit plný text záznamu

Report

Small-scale proxies for large-scale Transformer training instabilities

Autor: Wortsman, Mitchell, Liu, Peter J., Xiao, Lechao, Everett, Katie, Alemi, Alex, Adlam, Ben, Co-Reyes, John D., Gur, Izzeddin, Kumar, Abhishek, Novak, Roman, Pennington, Jeffrey, Sohl-dickstein, Jascha, Xu, Kelvin, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific

Externí odkaz: http://arxiv.org/abs/2309.14322

Zobrazit plný text záznamu

Report

Fast Neural Kernel Embeddings for General Activations

Autor: Han, Insu, Zandieh, Amir, Lee, Jaehoon, Novak, Roman, Xiao, Lechao, Karbasi, Amin

Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large

Externí odkaz: http://arxiv.org/abs/2209.04121

Zobrazit plný text záznamu

Report

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

Autor: Xiao, Lechao, Pennington, Jeffrey

Although learning in high dimensions is commonly believed to suffer from the curse of dimensionality, modern machine learning methods often exhibit an astonishing power to tackle a wide range of challenging real-world learning problems without using

Externí odkaz: http://arxiv.org/abs/2207.04612

Zobrazit plný text záznamu

Report

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

Autor: Xiao, Lechao, Hu, Hong, Misiakiewicz, Theodor, Lu, Yue M., Pennington, Jeffrey

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theor

Externí odkaz: http://arxiv.org/abs/2205.14846

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání