Zobrazeno 1 - 10
of 10
pro vyhledávání: '"Alemi, Alex"'
Autor:
Lester, Brian, Lee, Jaehoon, Alemi, Alex, Pennington, Jeffrey, Roberts, Adam, Sohl-Dickstein, Jascha, Constant, Noah
In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it we
Externí odkaz:
http://arxiv.org/abs/2404.03626
Autor:
Singh, Avi, Co-Reyes, John D., Agarwal, Rishabh, Anand, Ankesh, Patil, Piyush, Garcia, Xavier, Liu, Peter J., Harrison, James, Lee, Jaehoon, Xu, Kelvin, Parisi, Aaron, Kumar, Abhishek, Alemi, Alex, Rizkowsky, Alex, Nova, Azade, Adlam, Ben, Bohnet, Bernd, Elsayed, Gamaleldin, Sedghi, Hanie, Mordatch, Igor, Simpson, Isabelle, Gur, Izzeddin, Snoek, Jasper, Pennington, Jeffrey, Hron, Jiri, Kenealy, Kathleen, Swersky, Kevin, Mahajan, Kshiteej, Culp, Laura, Xiao, Lechao, Bileschi, Maxwell L., Constant, Noah, Novak, Roman, Liu, Rosanne, Warkentin, Tris, Qian, Yundi, Bansal, Yamini, Dyer, Ethan, Neyshabur, Behnam, Sohl-Dickstein, Jascha, Fiedel, Noah
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go bey
Externí odkaz:
http://arxiv.org/abs/2312.06585
Autor:
Freeman, C. Daniel, Culp, Laura, Parisi, Aaron, Bileschi, Maxwell L, Elsayed, Gamaleldin F, Rizkowsky, Alex, Simpson, Isabelle, Alemi, Alex, Nova, Azade, Adlam, Ben, Bohnet, Bernd, Mishra, Gaurav, Sedghi, Hanie, Mordatch, Igor, Gur, Izzeddin, Lee, Jaehoon, Co-Reyes, JD, Pennington, Jeffrey, Xu, Kelvin, Swersky, Kevin, Mahajan, Kshiteej, Xiao, Lechao, Liu, Rosanne, Kornblith, Simon, Constant, Noah, Liu, Peter J., Novak, Roman, Qian, Yundi, Fiedel, Noah, Sohl-Dickstein, Jascha
We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial str
Externí odkaz:
http://arxiv.org/abs/2311.07587
Autor:
Wortsman, Mitchell, Liu, Peter J., Xiao, Lechao, Everett, Katie, Alemi, Alex, Adlam, Ben, Co-Reyes, John D., Gur, Izzeddin, Kumar, Abhishek, Novak, Roman, Pennington, Jeffrey, Sohl-dickstein, Jascha, Xu, Kelvin, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon
Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific
Externí odkaz:
http://arxiv.org/abs/2309.14322
Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information. For example, variational autoencoders with autoregressive decoders often collapse
Externí odkaz:
http://arxiv.org/abs/1905.07478
Autor:
Dillon, Joshua V., Langmore, Ian, Tran, Dustin, Brevdo, Eugene, Vasudevan, Srinivas, Moore, Dave, Patton, Brian, Alemi, Alex, Hoffman, Matt, Saurous, Rif A.
The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabili
Externí odkaz:
http://arxiv.org/abs/1711.10604
Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned
Externí odkaz:
http://arxiv.org/abs/1710.09599
Autor:
Fragkiadaki, Katerina, Huang, Jonathan, Alemi, Alex, Vijayanarasimhan, Sudheendra, Ricco, Susanna, Sukthankar, Rahul
Given a visual history, multiple future outcomes for a video scene are equally probable, in other words, the distribution of future outcomes has multiple modes. Multimodality is notoriously hard to handle by standard regressors or classifiers: the fo
Externí odkaz:
http://arxiv.org/abs/1705.02082
Autor:
Alemi, Alex A., Chollet, Francois, Een, Niklas, Irving, Geoffrey, Szegedy, Christian, Urban, Josef
We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the p
Externí odkaz:
http://arxiv.org/abs/1606.04442
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computationa
Externí odkaz:
http://arxiv.org/abs/1602.07261