Zobrazeno 1 - 10
of 22
pro vyhledávání: '"Thilak, Vimal"'
Image-based Joint-Embedding Predictive Architecture (IJEPA) offers an attractive alternative to Masked Autoencoder (MAE) for representation learning using the Masked Image Modeling framework. IJEPA drives representations to capture useful semantic in
Externí odkaz:
http://arxiv.org/abs/2410.10773
Autor:
Aldeneh, Zakaria, Thilak, Vimal, Higuchi, Takuya, Theobald, Barry-John, Likhomanenko, Tatiana
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL). Traditionally, assessing the performance of these encoders is resource-intensive and require
Externí odkaz:
http://arxiv.org/abs/2409.10787
Autor:
Littwin, Etai, Saremi, Omid, Advani, Madhu, Thilak, Vimal, Nakkiran, Preetum, Huang, Chen, Susskind, Joshua
Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive o
Externí odkaz:
http://arxiv.org/abs/2407.03475
Autor:
Thilak, Vimal, Huang, Chen, Saremi, Omid, Dinh, Laurent, Goh, Hanlin, Nakkiran, Preetum, Susskind, Joshua M., Littwin, Etai
Joint embedding (JE) architectures have emerged as a promising avenue for acquiring transferable data representations. A key obstacle to using JE methods, however, is the inherent challenge of evaluating learned representations without access to a do
Externí odkaz:
http://arxiv.org/abs/2312.04000
Autor:
Razin, Noam, Zhou, Hattie, Saremi, Omid, Thilak, Vimal, Bradley, Arwen, Nakkiran, Preetum, Susskind, Joshua, Littwin, Etai
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a f
Externí odkaz:
http://arxiv.org/abs/2310.20703
Autor:
Abnar, Samira, Saremi, Omid, Dinh, Laurent, Wilson, Shantel, Bautista, Miguel Angel, Huang, Chen, Thilak, Vimal, Littwin, Etai, Gu, Jiatao, Susskind, Josh, Bengio, Samy
Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that s
Externí odkaz:
http://arxiv.org/abs/2310.08866
The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 ) refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnin
Externí odkaz:
http://arxiv.org/abs/2206.04817
Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization. In this paper, we take a step further and analyze implicit rank regularization in autoencoders. We show greedy learning of l
Externí odkaz:
http://arxiv.org/abs/2107.01301
Autor:
Littwin, Etai, Saremi, Omid, Zhai, Shuangfei, Thilak, Vimal, Goh, Hanlin, Susskind, Joshua M., Yang, Greg
We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-nec
Externí odkaz:
http://arxiv.org/abs/2107.00364
Autor:
Creusere, Charles D., Thilak, Vimal
Publication in the conference proceedings of EUSIPCO, Viena, Austria, 2004
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::f8c7b8ed610fa39e8130409e95f341f1