Výsledky vyhledávání - "Elliott, Desmond"

Report

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

Autor: Li, Wenyan, Zhang, Xinyu, Li, Jiaang, Peng, Qiwei, Tang, Raphael, Zhou, Li, Zhang, Weijia, Hu, Guimin, Yuan, Yifei, Søgaard, Anders, Hershcovich, Daniel, Elliott, Desmond

Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-gr

Externí odkaz: http://arxiv.org/abs/2406.11030

Zobrazit plný text záznamu

Report

Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Autor: Li, Wenyan, Li, Jiaang, Ramos, Rita, Tang, Raphael, Elliott, Desmond

Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success of retrieva

Externí odkaz: http://arxiv.org/abs/2406.02265

Zobrazit plný text záznamu

Report

Sequential Compositional Generalization in Multimodal Models

Autor: Yagcioglu, Semih, İnce, Osman Batur, Erdem, Aykut, Erdem, Erkut, Elliott, Desmond, Yuret, Deniz

The rise of large-scale multimodal models has paved the pathway for groundbreaking advances in generative modeling and reasoning, unlocking transformative applications in a variety of complex tasks. However, a pressing question that remains is their

Externí odkaz: http://arxiv.org/abs/2404.12013

Zobrazit plný text záznamu

Report

Text Rendering Strategies for Pixel Language Models

Autor: Lotz, Jonas F., Salesky, Elizabeth, Rust, Phillip, Elliott, Desmond

Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of alm

Externí odkaz: http://arxiv.org/abs/2311.00522

Zobrazit plný text záznamu

Report

Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models

Autor: Cabello, Laura, Bugliarello, Emanuele, Brandl, Stephanie, Elliott, Desmond

Pretrained machine learning models are known to perpetuate and even amplify existing biases in data, which can result in unfair outcomes that ultimately impact user experience. Therefore, it is crucial to understand the mechanisms behind those prejud

Externí odkaz: http://arxiv.org/abs/2310.17530

Zobrazit plný text záznamu

Report

PHD: Pixel-Based Language Modeling of Historical Documents

Autor: Borenstein, Nadav, Rust, Phillip, Elliott, Desmond, Augenstein, Isabelle

The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional approach to analysing historical documents involves converting them from images to text using OCR, a process that overlo

Externí odkaz: http://arxiv.org/abs/2310.18343

Zobrazit plný text záznamu

Report

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Autor: Ramos, Rita, Martins, Bruno, Elliott, Desmond

Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process. Without requiring any multilingual caption data, we propose LMCap, an image-blind

Externí odkaz: http://arxiv.org/abs/2305.19821

Zobrazit plný text záznamu

Report

The Role of Data Curation in Image Captioning

Autor: Li, Wenyan, Lotz, Jonas F., Qiu, Chen, Elliott, Desmond

Image captioning models are typically trained by treating all samples equally, neglecting to account for mismatched or otherwise difficult data points. In contrast, recent work has shown the effectiveness of training models by scheduling the data usi

Externí odkaz: http://arxiv.org/abs/2305.03610

Zobrazit plný text záznamu

Report

Retrieval-augmented Image Captioning

Autor: Ramos, Rita, Elliott, Desmond, Martins, Bruno

Publikováno v: EACL 2023

Inspired by retrieval-augmented language generation and pretrained Vision and Language (V&L) encoders, we present a new approach to image captioning that generates sentences given the input image and a set of captions retrieved from a datastore, as o

Externí odkaz: http://arxiv.org/abs/2302.08268

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání