Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Henaff, Olivier"'
Autor:
Roth, Karsten, Udandarao, Vishaal, Dziadzio, Sebastian, Prabhu, Ameya, Cherti, Mehdi, Vinyals, Oriol, Hénaff, Olivier, Albanie, Samuel, Bethge, Matthias, Akata, Zeynep
Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretraining mainly
Externí odkaz:
http://arxiv.org/abs/2408.14471
Autor:
Beyer, Lucas, Steiner, Andreas, Pinto, André Susano, Kolesnikov, Alexander, Wang, Xiao, Salz, Daniel, Neumann, Maxim, Alabdulmohsin, Ibrahim, Tschannen, Michael, Bugliarello, Emanuele, Unterthiner, Thomas, Keysers, Daniel, Koppula, Skanda, Liu, Fangyu, Grycner, Adam, Gritsenko, Alexey, Houlsby, Neil, Kumar, Manoj, Rong, Keran, Eisenschlos, Julian, Kabra, Rishabh, Bauer, Matthias, Bošnjak, Matko, Chen, Xi, Minderer, Matthias, Voigtlaender, Paul, Bica, Ioana, Balazevic, Ivana, Puigcerver, Joan, Papalampidi, Pinelopi, Henaff, Olivier, Xiong, Xi, Soricut, Radu, Harmsen, Jeremiah, Zhai, Xiaohua
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong
Externí odkaz:
http://arxiv.org/abs/2407.07726
Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the
Externí odkaz:
http://arxiv.org/abs/2406.17711
With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (R
Externí odkaz:
http://arxiv.org/abs/2406.09384
Autor:
Balažević, Ivana, Shi, Yuge, Papalampidi, Pinelopi, Chaabouni, Rahma, Koppula, Skanda, Hénaff, Olivier J.
Most transformer-based video encoders are limited to short temporal contexts due to their quadratic complexity. While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complex
Externí odkaz:
http://arxiv.org/abs/2402.05861
Publikováno v:
Transactions on Machine Learning Research, Jun 2024
Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the be
Externí odkaz:
http://arxiv.org/abs/2312.11436
Autor:
Evans, Talfan, Pathak, Shreya, Merzic, Hamza, Schwarz, Jonathan, Tanno, Ryutaro, Henaff, Olivier J.
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these methods hav
Externí odkaz:
http://arxiv.org/abs/2312.05328
Autor:
Roth, Karsten, Thede, Lukas, Koepke, Almut Sophia, Vinyals, Oriol, Hénaff, Olivier, Akata, Zeynep
Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from the data.
Externí odkaz:
http://arxiv.org/abs/2310.17653
Autor:
Balažević, Ivana, Steiner, David, Parthasarathy, Nikhil, Arandjelović, Relja, Hénaff, Olivier J.
In-context learning$\unicode{x2013}$the ability to configure a model's behavior with different prompts$\unicode{x2013}$has revolutionized the field of natural language processing, alleviating the need for task-specific models and paving the way for g
Externí odkaz:
http://arxiv.org/abs/2306.01667
Autor:
Arandjelović, Relja, Andonian, Alex, Mensch, Arthur, Hénaff, Olivier J., Alayrac, Jean-Baptiste, Zisserman, Andrew
The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vi
Externí odkaz:
http://arxiv.org/abs/2303.13518