Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Devillers, Benjamin"'
Humans perceive the world through multiple senses, enabling them to create a comprehensive representation of their surroundings and to generalize information across domains. For instance, when a textual description of a scene is given, humans can men
Externí odkaz:
http://arxiv.org/abs/2403.04588
Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or text-to-image
Externí odkaz:
http://arxiv.org/abs/2306.15711
Vision models trained on multimodal datasets can benefit from the wide availability of large image-caption datasets. A recent model (CLIP) was found to generalize well in zero-shot and transfer learning settings. This could imply that linguistic or "
Externí odkaz:
http://arxiv.org/abs/2104.08313
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.