Zobrazeno 1 - 10
of 1 325
pro vyhledávání: '"multimodal retrieval"'
Autor:
Zhai, Wenjia
Traditional Retrieval-Augmented Generation (RAG) methods are limited by their reliance on a fixed number of retrieved documents, often resulting in incomplete or noisy information that undermines task performance. Although recent adaptive approaches
Externí odkaz:
http://arxiv.org/abs/2410.11321
MLLM agents demonstrate potential for complex embodied tasks by retrieving multimodal task-relevant trajectory data. However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglectin
Externí odkaz:
http://arxiv.org/abs/2410.03450
Autor:
Zhu, Zhengyuan, Lee, Daniel, Zhang, Hong, Harsha, Sai Sree, Feujio, Loic, Maharaj, Akash, Li, Yunyao
Recent advancements in retrieval-augmented generation (RAG) have demonstrated impressive performance in the question-answering (QA) task. However, most previous works predominantly focus on text-based answers. While some studies address multimodal da
Externí odkaz:
http://arxiv.org/abs/2408.08521
Plant disease recognition is a critical task that ensures crop health and mitigates the damage caused by diseases. A handy tool that enables farmers to receive a diagnosis based on query pictures or the text description of suspicious plants is in hig
Externí odkaz:
http://arxiv.org/abs/2408.14723
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities. However, a significant drawback of MLLMs is their reliance on static training data, leading to ou
Externí odkaz:
http://arxiv.org/abs/2407.21439
Multimodal foundation models hold significant potential for automating radiology report generation, thereby assisting clinicians in diagnosing cardiac diseases. However, generated reports often suffer from serious factual inaccuracy. In this paper, w
Externí odkaz:
http://arxiv.org/abs/2407.15268
In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious,
Externí odkaz:
http://arxiv.org/abs/2406.11251
Millions of news articles published online daily can overwhelm readers. Headlines and entity (topic) tags are essential for guiding readers to decide if the content is worth their time. While headline generation has been extensively studied, tag gene
Externí odkaz:
http://arxiv.org/abs/2406.03776
Publikováno v:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates, which better represents the meaning of an ambiguous word within a given context. In this paper, we make a substantial
Externí odkaz:
http://arxiv.org/abs/2310.14025