Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Littwin, Etai"'
Image-based Joint-Embedding Predictive Architecture (IJEPA) offers an attractive alternative to Masked Autoencoder (MAE) for representation learning using the Masked Image Modeling framework. IJEPA drives representations to capture useful semantic in
Externí odkaz:
http://arxiv.org/abs/2410.10773
Generating user intent from a sequence of user interface (UI) actions is a core challenge in comprehensive UI understanding. Recent advancements in multimodal large language models (MLLMs) have led to substantial progress in this area, but their dema
Externí odkaz:
http://arxiv.org/abs/2409.04081
Autor:
Littwin, Etai, Saremi, Omid, Advani, Madhu, Thilak, Vimal, Nakkiran, Preetum, Huang, Chen, Susskind, Joshua
Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive o
Externí odkaz:
http://arxiv.org/abs/2407.03475
Autor:
Thilak, Vimal, Huang, Chen, Saremi, Omid, Dinh, Laurent, Goh, Hanlin, Nakkiran, Preetum, Susskind, Joshua M., Littwin, Etai
Joint embedding (JE) architectures have emerged as a promising avenue for acquiring transferable data representations. A key obstacle to using JE methods, however, is the inherent challenge of evaluating learned representations without access to a do
Externí odkaz:
http://arxiv.org/abs/2312.04000
Autor:
Razin, Noam, Zhou, Hattie, Saremi, Omid, Thilak, Vimal, Bradley, Arwen, Nakkiran, Preetum, Susskind, Joshua, Littwin, Etai
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a f
Externí odkaz:
http://arxiv.org/abs/2310.20703
Autor:
Zhou, Hattie, Bradley, Arwen, Littwin, Etai, Razin, Noam, Saremi, Omid, Susskind, Josh, Bengio, Samy, Nakkiran, Preetum
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for s
Externí odkaz:
http://arxiv.org/abs/2310.16028
Autor:
Boix-Adsera, Enric, Saremi, Omid, Abbe, Emmanuel, Bengio, Samy, Littwin, Etai, Susskind, Joshua
We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding abstract relations, and are then tested out-of-distribution on data that contains symbols that did no
Externí odkaz:
http://arxiv.org/abs/2310.09753
Autor:
Abnar, Samira, Saremi, Omid, Dinh, Laurent, Wilson, Shantel, Bautista, Miguel Angel, Huang, Chen, Thilak, Vimal, Littwin, Etai, Gu, Jiatao, Susskind, Josh, Bengio, Samy
Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that s
Externí odkaz:
http://arxiv.org/abs/2310.08866
Autor:
Yang, Greg, Littwin, Etai
Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general o
Externí odkaz:
http://arxiv.org/abs/2308.01814
We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and sma
Externí odkaz:
http://arxiv.org/abs/2306.07042