Výsledky vyhledávání

Report

Training LLMs over Neurally Compressed Text

Autor: Lester, Brian, Lee, Jaehoon, Alemi, Alex, Pennington, Jeffrey, Roberts, Adam, Sohl-Dickstein, Jascha, Constant, Noah

In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it we

Externí odkaz: http://arxiv.org/abs/2404.03626

Zobrazit plný text záznamu

Report

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go bey

Externí odkaz: http://arxiv.org/abs/2312.06585

Zobrazit plný text záznamu

Report

Frontier Language Models are not Robust to Adversarial Arithmetic, or 'What do I need to say so you agree 2+2=5?

We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial str

Externí odkaz: http://arxiv.org/abs/2311.07587

Zobrazit plný text záznamu

Report

Small-scale proxies for large-scale Transformer training instabilities

Autor: Wortsman, Mitchell, Liu, Peter J., Xiao, Lechao, Everett, Katie, Alemi, Alex, Adlam, Ben, Co-Reyes, John D., Gur, Izzeddin, Kumar, Abhishek, Novak, Roman, Pennington, Jeffrey, Sohl-dickstein, Jascha, Xu, Kelvin, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific

Externí odkaz: http://arxiv.org/abs/2309.14322

Zobrazit plný text záznamu

Report

Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces

Autor: Seybold, Bryan, Fertig, Emily, Alemi, Alex, Fischer, Ian

Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information. For example, variational autoencoders with autoregressive decoders often collapse

Externí odkaz: http://arxiv.org/abs/1905.07478

Zobrazit plný text záznamu

Report

TensorFlow Distributions

Autor: Dillon, Joshua V., Langmore, Ian, Tran, Dustin, Brevdo, Eugene, Vasudevan, Srinivas, Moore, Dave, Patton, Brian, Alemi, Alex, Hoffman, Matt, Saurous, Rif A.

The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabili

Externí odkaz: http://arxiv.org/abs/1711.10604

Zobrazit plný text záznamu

Report

Watch Your Step: Learning Node Embeddings via Graph Attention

Autor: Abu-El-Haija, Sami, Perozzi, Bryan, Al-Rfou, Rami, Alemi, Alex

Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned

Externí odkaz: http://arxiv.org/abs/1710.09599

Zobrazit plný text záznamu

Report

Motion Prediction Under Multimodality with Conditional Stochastic Networks

Autor: Fragkiadaki, Katerina, Huang, Jonathan, Alemi, Alex, Vijayanarasimhan, Sudheendra, Ricco, Susanna, Sukthankar, Rahul

Given a visual history, multiple future outcomes for a video scene are equally probable, in other words, the distribution of future outcomes has multiple modes. Multimodality is notoriously hard to handle by standard regressors or classifiers: the fo

Externí odkaz: http://arxiv.org/abs/1705.02082

Zobrazit plný text záznamu

Report

DeepMath - Deep Sequence Models for Premise Selection

Autor: Alemi, Alex A., Chollet, Francois, Een, Niklas, Irving, Geoffrey, Szegedy, Christian, Urban, Josef

We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the p

Externí odkaz: http://arxiv.org/abs/1606.04442

Zobrazit plný text záznamu

Report

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Autor: Szegedy, Christian, Ioffe, Sergey, Vanhoucke, Vincent, Alemi, Alex

Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computationa

Externí odkaz: http://arxiv.org/abs/1602.07261

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání