Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Salewski, Leonard"'
Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language into text,
Externí odkaz:
http://arxiv.org/abs/2311.08396
Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer
Externí odkaz:
http://arxiv.org/abs/2311.05043
In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume differ
Externí odkaz:
http://arxiv.org/abs/2305.14930
To generate proper captions for videos, the inference needs to identify relevant concepts and pay attention to the spatial relationships between them as well as to the temporal development in the clip. Our end-to-end encoder-decoder video captioning
Externí odkaz:
http://arxiv.org/abs/2208.09266
Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scal
Externí odkaz:
http://arxiv.org/abs/2204.02380
Autor:
Kayser, Maxime, Camburu, Oana-Maria, Salewski, Leonard, Emde, Cornelius, Do, Virginie, Akata, Zeynep, Lukasiewicz, Thomas
Recently, there has been an increasing number of efforts to introduce models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing, because they can provide human-fri
Externí odkaz:
http://arxiv.org/abs/2105.03761
Transferring learned models to novel tasks is a challenging problem, particularly if only very few labeled examples are available. Although this few-shot learning setup has received a lot of attention recently, most proposed methods focus on discrimi
Externí odkaz:
http://arxiv.org/abs/1907.09557