Zobrazeno 1 - 10
of 311
pro vyhledávání: '"Rabbat, Michael"'
Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from th
Externí odkaz:
http://arxiv.org/abs/2404.16717
Autor:
Lehnert, Lucas, Sukhbaatar, Sainbayar, Su, DiJia, Zheng, Qinqing, Mcvay, Paul, Rabbat, Michael, Tian, Yuandong
While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers
Externí odkaz:
http://arxiv.org/abs/2402.14083
Autor:
Bardes, Adrien, Garrido, Quentin, Ponce, Jean, Chen, Xinlei, Rabbat, Michael, LeCun, Yann, Assran, Mahmoud, Ballas, Nicolas
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encod
Externí odkaz:
http://arxiv.org/abs/2404.08471
Autor:
Shi, Hao-Jun Michael, Lee, Tsung-Hsien, Iwasaki, Shintaro, Gallego-Posada, Jose, Li, Zhijing, Rangadurai, Kaushik, Mudigere, Dheevatsa, Rabbat, Michael
Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to
Externí odkaz:
http://arxiv.org/abs/2309.06497
Autor:
Oquab, Maxime, Darcet, Timothée, Moutakanni, Théo, Vo, Huy, Szafraniec, Marc, Khalidov, Vasil, Fernandez, Pierre, Haziza, Daniel, Massa, Francisco, El-Nouby, Alaaeldin, Assran, Mahmoud, Ballas, Nicolas, Galuba, Wojciech, Howes, Russell, Huang, Po-Yao, Li, Shang-Wen, Misra, Ishan, Rabbat, Michael, Sharma, Vasu, Synnaeve, Gabriel, Xu, Hu, Jegou, Hervé, Mairal, Julien, Labatut, Patrick, Joulin, Armand, Bojanowski, Piotr
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by pro
Externí odkaz:
http://arxiv.org/abs/2304.07193
Autor:
Yousefpour, Ashkan, Guo, Shen, Shenoy, Ashish, Ghosh, Sayan, Stock, Pierre, Maeng, Kiwan, Krüger, Schalk-Willem, Rabbat, Michael, Wu, Carole-Jean, Mironov, Ilya
The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequence, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every
Externí odkaz:
http://arxiv.org/abs/2303.14604
Autor:
Assran, Mahmoud, Duval, Quentin, Misra, Ishan, Bojanowski, Piotr, Vincent, Pascal, Rabbat, Michael, LeCun, Yann, Ballas, Nicolas
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for se
Externí odkaz:
http://arxiv.org/abs/2301.08243
Autor:
Wortsman, Mitchell, Gururangan, Suchin, Li, Shen, Farhadi, Ali, Schmidt, Ludwig, Rabbat, Michael, Morcos, Ari S.
When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-t
Externí odkaz:
http://arxiv.org/abs/2210.11948
An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to the fact that
Externí odkaz:
http://arxiv.org/abs/2210.08090
Autor:
Assran, Mahmoud, Balestriero, Randall, Duval, Quentin, Bordes, Florian, Misra, Ishan, Bojanowski, Piotr, Vincent, Pascal, Rabbat, Michael, Ballas, Nicolas
A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to le
Externí odkaz:
http://arxiv.org/abs/2210.07277