Výsledky vyhledávání - "Krojer, Benno"

Report

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Autor: Krojer, Benno, Vattikonda, Dheeraj, Lara, Luis, Jampani, Varun, Portelance, Eva, Pal, Christopher, Reddy, Siva

An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models

Externí odkaz: http://arxiv.org/abs/2407.03471

Zobrazit plný text záznamu

Report

Improving Automatic VQA Evaluation Using Large Language Models

Autor: Mañas, Oscar, Krojer, Benno, Agrawal, Aishwarya

8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towa

Externí odkaz: http://arxiv.org/abs/2310.02567

Zobrazit plný text záznamu

Report

Pragmatic Inference with a CLIP Listener for Contrastive Captioning

Autor: Ou, Jiefu, Krojer, Benno, Fried, Daniel

We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedur

Externí odkaz: http://arxiv.org/abs/2306.08818

Zobrazit plný text záznamu

Report

Are Diffusion Models Vision-And-Language Reasoners?

Autor: Krojer, Benno, Poole-Dayan, Elinor, Voleti, Vikram, Pal, Christopher, Reddy, Siva

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generat

Externí odkaz: http://arxiv.org/abs/2305.16397

Zobrazit plný text záznamu

Report

Image Retrieval from Contextual Descriptions

Autor: Krojer, Benno, Adlakha, Vaibhav, Vineet, Vibhav, Goyal, Yash, Ponti, Edoardo, Reddy, Siva

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we devise a

Externí odkaz: http://arxiv.org/abs/2203.15867

Zobrazit plný text záznamu

Report

Are Pretrained Language Models Symbolic Reasoners Over Knowledge?

Autor: Kassner, Nora, Krojer, Benno, Schütze, Hinrich

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present

Externí odkaz: http://arxiv.org/abs/2006.10413

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání