Výsledky vyhledávání - "Littwin, Etai"

Report

Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning

Autor: Littwin, Etai, Thilak, Vimal, Gopalakrishnan, Anand

Image-based Joint-Embedding Predictive Architecture (IJEPA) offers an attractive alternative to Masked Autoencoder (MAE) for representation learning using the Masked Image Modeling framework. IJEPA drives representations to capture useful semantic in

Externí odkaz: http://arxiv.org/abs/2410.10773

Zobrazit plný text záznamu

Report

UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity

Autor: Fu, Yicheng, Anantha, Raviteja, Vashisht, Prabal, Cheng, Jianpeng, Littwin, Etai

Generating user intent from a sequence of user interface (UI) actions is a core challenge in comprehensive UI understanding. Recent advancements in multimodal large language models (MLLMs) have led to substantial progress in this area, but their dema

Externí odkaz: http://arxiv.org/abs/2409.04081

Zobrazit plný text záznamu

Report

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Autor: Littwin, Etai, Saremi, Omid, Advani, Madhu, Thilak, Vimal, Nakkiran, Preetum, Huang, Chen, Susskind, Joshua

Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive o

Externí odkaz: http://arxiv.org/abs/2407.03475

Zobrazit plný text záznamu

Report

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Autor: Thilak, Vimal, Huang, Chen, Saremi, Omid, Dinh, Laurent, Goh, Hanlin, Nakkiran, Preetum, Susskind, Joshua M., Littwin, Etai

Joint embedding (JE) architectures have emerged as a promising avenue for acquiring transferable data representations. A key obstacle to using JE methods, however, is the inherent challenge of evaluating learned representations without access to a do

Externí odkaz: http://arxiv.org/abs/2312.04000

Zobrazit plný text záznamu

Report

Vanishing Gradients in Reinforcement Finetuning of Language Models

Autor: Razin, Noam, Zhou, Hattie, Saremi, Omid, Thilak, Vimal, Bradley, Arwen, Nakkiran, Preetum, Susskind, Joshua, Littwin, Etai

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a f

Externí odkaz: http://arxiv.org/abs/2310.20703

Zobrazit plný text záznamu

Report

What Algorithms can Transformers Learn? A Study in Length Generalization

Autor: Zhou, Hattie, Bradley, Arwen, Littwin, Etai, Razin, Noam, Saremi, Omid, Susskind, Josh, Bengio, Samy, Nakkiran, Preetum

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for s

Externí odkaz: http://arxiv.org/abs/2310.16028

Zobrazit plný text záznamu

Report

When can transformers reason with abstract symbols?

Autor: Boix-Adsera, Enric, Saremi, Omid, Abbe, Emmanuel, Bengio, Samy, Littwin, Etai, Susskind, Joshua

We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding abstract relations, and are then tested out-of-distribution on data that contains symbols that did no

Externí odkaz: http://arxiv.org/abs/2310.09753

Zobrazit plný text záznamu

Report

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Autor: Abnar, Samira, Saremi, Omid, Dinh, Laurent, Wilson, Shantel, Bautista, Miguel Angel, Huang, Chen, Thilak, Vimal, Littwin, Etai, Gu, Jiatao, Susskind, Josh, Bengio, Samy

Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that s

Externí odkaz: http://arxiv.org/abs/2310.08866

Zobrazit plný text záznamu

Report

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Autor: Yang, Greg, Littwin, Etai

Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general o

Externí odkaz: http://arxiv.org/abs/2308.01814

Zobrazit plný text záznamu

Report

Transformers learn through gradual rank increase

Autor: Boix-Adsera, Enric, Littwin, Etai, Abbe, Emmanuel, Bengio, Samy, Susskind, Joshua

We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and sma

Externí odkaz: http://arxiv.org/abs/2306.07042

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání