Zobrazeno 1 - 10
of 613
pro vyhledávání: '"Magalhães, João P."'
Methods based on Contrastive Language-Image Pre-training (CLIP) are nowadays extensively used in support of vision-and-language tasks involving remote sensing data, such as cross-modal retrieval. The adaptation of CLIP to this specific domain has rel
Externí odkaz:
http://arxiv.org/abs/2410.23370
Conversational systems must be robust to user interactions that naturally exhibit diverse conversational traits. Capturing and simulating these diverse traits coherently and efficiently presents a complex challenge. This paper introduces Multi-Trait
Externí odkaz:
http://arxiv.org/abs/2410.12891
Guiding users through complex procedural plans is an inherently multimodal task in which having visually illustrated plan steps is crucial to deliver an effective plan guidance. However, existing works on plan-following language models (LMs) often ar
Externí odkaz:
http://arxiv.org/abs/2409.19074
Publikováno v:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) (2024) 1280-1287
This paper describes our approach to the SemEval-2024 safe biomedical Natural Language Inference for Clinical Trials (NLI4CT) task, which concerns classifying statements about Clinical Trial Reports (CTRs). We explored the capabilities of Mistral-7B,
Externí odkaz:
http://arxiv.org/abs/2408.03127
SLVideo is a video moment retrieval system for Sign Language videos that incorporates facial expressions, addressing this gap in existing technology. The system extracts embedding representations for the hand and face signs from video frames to captu
Externí odkaz:
http://arxiv.org/abs/2407.15668
Generated video scenes for action-centric sequence descriptions, such as recipe instructions and do-it-yourself projects, often include non-linear patterns, where the next video may need to be visually consistent not with the immediately preceding vi
Externí odkaz:
http://arxiv.org/abs/2407.11814
Autor:
Bordalo, João, Ramos, Vasco, Valério, Rodrigo, Glória-Silva, Diogo, Bitton, Yonatan, Yarom, Michal, Szpektor, Idan, Magalhaes, Joao
Multistep instructions, such as recipes and how-to guides, greatly benefit from visual aids, such as a series of images that accompany the instruction steps. While Large Language Models (LLMs) have become adept at generating coherent textual steps, L
Externí odkaz:
http://arxiv.org/abs/2405.10122
This study investigates the existence of positional biases in Transformer-based models for text representation learning, particularly in the context of web document retrieval. We build on previous research that demonstrated loss of information in the
Externí odkaz:
http://arxiv.org/abs/2404.04163
Significant strides have been made in natural language tasks, largely attributed to the emergence of powerful large language models (LLMs). These models, pre-trained on extensive and diverse corpora, have become increasingly capable of comprehending
Externí odkaz:
http://arxiv.org/abs/2402.12969
Image captioning and cross-modal retrieval are examples of tasks that involve the joint analysis of visual and linguistic information. In connection to remote sensing imagery, these tasks can help non-expert users in extracting relevant Earth observa
Externí odkaz:
http://arxiv.org/abs/2402.06475