Výsledky vyhledávání - "Goulas, Andreas"

Report

VidCtx: Context-aware Video Question Answering with Image Models

Autor: Goulas, Andreas, Mezaris, Vasileios, Patras, Ioannis

To address computational and memory limitations of Large Multimodal Models in the Video Question-Answering task, several recent methods extract textual representations per frame (e.g., by captioning) and feed them to a Large Language Model (LLM) that

Externí odkaz: http://arxiv.org/abs/2412.17415

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání