Zobrazeno 1 - 10
of 158
pro vyhledávání: '"Le, Vuong"'
Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying
Externí odkaz:
http://arxiv.org/abs/2407.01983
Humans are highly adaptable, swiftly switching between different modes to progressively handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scal
Externí odkaz:
http://arxiv.org/abs/2307.12729
It would be a technological feat to be able to create a system that can hold a meaningful conversation with humans about what they watch. A setup toward that goal is presented as a video dialog task, where the system is asked to generate natural utte
Externí odkaz:
http://arxiv.org/abs/2207.03656
The current success of modern visual reasoning systems is arguably attributed to cross-modality attention mechanisms. However, in deliberative reasoning such as in VQA, attention is unconstrained at each step, and thus may serve as a statistical pool
Externí odkaz:
http://arxiv.org/abs/2205.12616
We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transient channels that are initiated a
Externí odkaz:
http://arxiv.org/abs/2204.09875
Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, ma
Externí odkaz:
http://arxiv.org/abs/2110.08253
Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands
Externí odkaz:
http://arxiv.org/abs/2106.13432
Autor:
Nguyen-Thai, Binh, Le, Vuong, Morgan, Catherine, Badawi, Nadia, Tran, Truyen, Venkatesh, Svetha
The absence or abnormality of fidgety movements of joints or limbs is strongly indicative of cerebral palsy in infants. Developing computer-based methods for assessing infant movements in videos is pivotal for improved cerebral palsy screening. Most
Externí odkaz:
http://arxiv.org/abs/2105.09783
Video question answering (Video QA) presents a powerful testbed for human-like intelligent behaviors. The task demands new capabilities to integrate video processing, language understanding, binding abstract linguistic concepts to concrete visual art
Externí odkaz:
http://arxiv.org/abs/2104.05166
Human activities can be learned from video. With effective modeling it is possible to discover not only the action labels but also the temporal structures of the activities such as the progression of the sub-activities. Automatically recognizing such
Externí odkaz:
http://arxiv.org/abs/2103.02758