Zobrazeno 1 - 10
of 73
pro vyhledávání: '"Paperno, Denis"'
Visual storytelling systems generate multi-sentence stories from image sequences. In this task, capturing contextual information and bridging visual variation bring additional challenges. We propose a simple yet effective framework that leverages the
Externí odkaz:
http://arxiv.org/abs/2408.06259
Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature has divided into two camps: While some argue that grounding allows for qualitati
Externí odkaz:
http://arxiv.org/abs/2310.11938
Derivationally related words, such as "runner" and "running", exhibit semantic differences which also elicit different visual scenarios. In this paper, we ask whether Vision and Language (V\&L) models capture such distinctions at the morphological le
Externí odkaz:
http://arxiv.org/abs/2309.11252
Multimodal embeddings aim to enrich the semantic information in neural representations of language compared to text-only models. While different embeddings exhibit different applicability and performance on downstream tasks, little is known about the
Externí odkaz:
http://arxiv.org/abs/2306.02348
Autor:
Tan, Shaomu, Paperno, Denis
In many real-world scenarios, the absence of external knowledge source like Wikipedia restricts question answering systems to rely on latent internal knowledge in limited dialogue data. In addition, humans often seek answers by asking several questio
Externí odkaz:
http://arxiv.org/abs/2212.08946
Accurately reporting what objects are depicted in an image is largely a solved problem in automatic caption generation. The next big challenge on the way to truly humanlike captioning is being able to incorporate the context of the image and related
Externí odkaz:
http://arxiv.org/abs/2210.04806
Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each componen
Externí odkaz:
http://arxiv.org/abs/2206.03529
Word embeddings have advanced the state of the art in NLP across numerous tasks. Understanding the contents of dense neural representations is of utmost interest to the computational semantics community. We propose to focus on relating these opaque w
Externí odkaz:
http://arxiv.org/abs/2205.13858
Can language models learn grounded representations from text distribution alone? This question is both central and recurrent in natural language processing; authors generally agree that grounding requires more than textual distribution. We propose to
Externí odkaz:
http://arxiv.org/abs/2108.07708
Publikováno v:
Proceedings of the 28th International Conference on Computational Linguistics (2020) 3737-3749
Compositionality is a widely discussed property of natural languages, although its exact definition has been elusive. We focus on the proposal that compositionality can be assessed by measuring meaning-form correlation. We analyze meaning-form correl
Externí odkaz:
http://arxiv.org/abs/2012.03833