Zobrazeno 1 - 10
of 5 975
pro vyhledávání: '"Shukor IN"'
Autor:
Fini, Enrico, Shukor, Mustafa, Li, Xiujun, Dufter, Philipp, Klein, Michal, Haldimann, David, Aitharaju, Sai, da Costa, Victor Guilherme Turrisi, Béthune, Louis, Gan, Zhe, Toshev, Alexander T, Eichner, Marcin, Nabi, Moin, Yang, Yinfei, Susskind, Joshua M., El-Nouby, Alaaeldin
We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we p
Externí odkaz:
http://arxiv.org/abs/2411.14402
Autor:
Shukor, Mustafa, Cord, Matthieu
Large Language Models (LLMs) have demonstrated remarkable success in both textual and multimodal domains. However, this success often comes with substantial computational costs, particularly when handling lengthy sequences of multimodal inputs. This
Externí odkaz:
http://arxiv.org/abs/2410.09454
Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains la
Externí odkaz:
http://arxiv.org/abs/2406.08074
Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In this paper
Externí odkaz:
http://arxiv.org/abs/2406.02842
Autor:
Shukor, Mustafa, Cord, Matthieu
Large Language Models (LLMs) have demonstrated impressive performance on multimodal tasks, without any multimodal finetuning. They are the building block for Large Multimodal Models, yet, we still lack a proper understanding of their success. In this
Externí odkaz:
http://arxiv.org/abs/2405.16700
Autor:
Baldassini, Folco Bertini, Shukor, Mustafa, Cord, Matthieu, Soulier, Laure, Piwowarski, Benjamin
Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples. In this work, we present a comp
Externí odkaz:
http://arxiv.org/abs/2404.15736
Autor:
Corradini, Barbara Toniella, Shukor, Mustafa, Couairon, Paul, Couairon, Guillaume, Scarselli, Franco, Cord, Matthieu
Foundation models have exhibited unprecedented capabilities in tackling many domains and tasks. Models such as CLIP are currently widely used to bridge cross-modal representations, and text-to-image diffusion models are arguably the leading models in
Externí odkaz:
http://arxiv.org/abs/2403.20105
The abilities of large language models (LLMs) have recently progressed to unprecedented levels, paving the way to novel applications in a wide variety of areas. In computer vision, LLMs can be used to prime vision-language tasks such image captioning
Externí odkaz:
http://arxiv.org/abs/2403.13499
Autor:
Razak, Nur Ain Shuhada Ab, Habib, Syahir, Shukor, Mohd Yunus Abd, Alias, Siti Aisyah, Smykla, Jerzy, Yasid, Nur Adeela
Despite its remoteness from other continents, the Antarctic region cannot escape the aftermath of human activities as it is highly influenced by anthropogenic impacts that occur both in the regional and global context. Contamination by microplastics,
Externí odkaz:
http://arxiv.org/abs/2401.02096
Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building in
Externí odkaz:
http://arxiv.org/abs/2310.01845