Zobrazeno 1 - 10
of 3 640
pro vyhledávání: '"Snoek, P. A."'
Video-text retrieval has seen significant advancements, yet the ability of models to discern subtle differences in captions still requires verification. In this paper, we introduce a new approach for fine-grained evaluation. Our approach can be appli
Externí odkaz:
http://arxiv.org/abs/2410.12407
This paper strives for motion-focused video-language representations. Existing methods to learn video-language representations use spatial-focused data, where identifying the objects and scene is often enough to distinguish the relevant caption. We i
Externí odkaz:
http://arxiv.org/abs/2410.12018
Autor:
Bhowmik, Aritra, Derakhshani, Mohammad Mahdi, Koelma, Dennis, Oswald, Martin R., Asano, Yuki M., Snoek, Cees G. M.
Spatial awareness is key to enable embodied multimodal AI systems. Yet, without vast amounts of spatial supervision, current Visual Language Models (VLMs) struggle at this task. In this paper, we introduce LynX, a framework that equips pretrained VLM
Externí odkaz:
http://arxiv.org/abs/2410.10491
Autor:
Najdenkoska, Ivona, Derakhshani, Mohammad Mahdi, Asano, Yuki M., van Noord, Nanne, Worring, Marcel, Snoek, Cees G. M.
We address the challenge of representing long captions in vision-language models, such as CLIP. By design these models are limited by fixed, absolute positional encodings, restricting inputs to a maximum of 77 tokens and hindering performance on task
Externí odkaz:
http://arxiv.org/abs/2410.10034
Large language models have demonstrated impressive performance when integrated with vision models even enabling video understanding. However, evaluating these video models presents its own unique challenges, for which several benchmarks have been pro
Externí odkaz:
http://arxiv.org/abs/2410.07752
In this paper, we address Generalized Category Discovery, aiming to simultaneously uncover novel categories and accurately classify known ones. Traditional methods, which lean heavily on self-supervision and contrastive learning, often fall short whe
Externí odkaz:
http://arxiv.org/abs/2408.14371
Autor:
Hron, Jiri, Culp, Laura, Elsayed, Gamaleldin, Liu, Rosanne, Adlam, Ben, Bileschi, Maxwell, Bohnet, Bernd, Co-Reyes, JD, Fiedel, Noah, Freeman, C. Daniel, Gur, Izzeddin, Kenealy, Kathleen, Lee, Jaehoon, Liu, Peter J., Mishra, Gaurav, Mordatch, Igor, Nova, Azade, Novak, Roman, Parisi, Aaron, Pennington, Jeffrey, Rizkowsky, Alex, Simpson, Isabelle, Sedghi, Hanie, Sohl-dickstein, Jascha, Swersky, Kevin, Vikram, Sharad, Warkentin, Tris, Xiao, Lechao, Xu, Kelvin, Snoek, Jasper, Kornblith, Simon
While many capabilities of language models (LMs) improve with increased training budget, the influence of scale on hallucinations is not yet fully understood. Hallucinations come in many forms, and there is no universally accepted definition. We thus
Externí odkaz:
http://arxiv.org/abs/2408.07852
Autor:
Salehi, Mohammadreza, Dorkenwald, Michael, Thoker, Fida Mohammad, Gavves, Efstratios, Snoek, Cees G. M., Asano, Yuki M.
Video-based pretraining offers immense potential for learning strong visual representations on an unprecedented scale. Recently, masked video modeling methods have shown promising scalability, yet fall short in capturing higher-level semantics due to
Externí odkaz:
http://arxiv.org/abs/2407.15447
Autor:
Sträter, Luc P. J., Salehi, Mohammadreza, Gavves, Efstratios, Snoek, Cees G. M., Asano, Yuki M.
In the domain of anomaly detection, methods often excel in either high-level semantic or low-level industrial benchmarks, rarely achieving cross-domain proficiency. Semantic anomalies are novelties that differ in meaning from the training set, like u
Externí odkaz:
http://arxiv.org/abs/2407.12427
Autor:
Nguyen, Duy-Kien, Assran, Mahmoud, Jain, Unnat, Oswald, Martin R., Snoek, Cees G. M., Chen, Xinlei
This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by
Externí odkaz:
http://arxiv.org/abs/2406.09415