Výsledky vyhledávání

Report

Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy

Autor: Ging, Simon, Bravo, María A., Brox, Thomas

The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks

Externí odkaz: http://arxiv.org/abs/2402.07270

Zobrazit plný text záznamu

Report

Open-vocabulary Attribute Detection

Autor: Bravo, María A., Mittal, Sudhanshu, Ging, Simon, Brox, Thomas

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to t

Externí odkaz: http://arxiv.org/abs/2211.12914

Zobrazit plný text záznamu

Report

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Autor: Ging, Simon, Zolfaghari, Mohammadreza, Pirsiavash, Hamed, Brox, Thomas

Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this paper, we propose a Cooperative hierarchical Transformer (COOT) to

Externí odkaz: http://arxiv.org/abs/2011.00597

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání