Výsledky vyhledávání

Report

SADL: An Effective In-Context Learning Method for Compositional Visual QA

Autor: Dang, Long Hoang, Le, Thao Minh, Le, Vuong, Phuong, Tu Minh, Tran, Truyen

Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying

Externí odkaz: http://arxiv.org/abs/2407.01983

Zobrazit plný text záznamu

Report

Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction

Autor: Tran, Hung, Le, Vuong, Venkatesh, Svetha, Tran, Truyen

Humans are highly adaptable, swiftly switching between different modes to progressively handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scal

Externí odkaz: http://arxiv.org/abs/2307.12729

Zobrazit plný text záznamu

Report

Video Dialog as Conversation about Objects Living in Space-Time

Autor: Pham, Hoang-Anh, Le, Thao Minh, Le, Vuong, Phuong, Tu Minh, Tran, Truyen

It would be a technological feat to be able to create a system that can hold a meaningful conversation with humans about what they watch. A setup toward that goal is presented as a video dialog task, where the system is asked to generate natural utte

Externí odkaz: http://arxiv.org/abs/2207.03656

Zobrazit plný text záznamu

Report

Guiding Visual Question Answering with Attention Priors

Autor: Le, Thao Minh, Le, Vuong, Gupta, Sunil, Venkatesh, Svetha, Tran, Truyen

The current success of modern visual reasoning systems is arguably attributed to cross-modality attention mechanisms. However, in deliberative reasoning such as in VQA, attention is unconstrained at each step, and thus may serve as a statistical pool

Externí odkaz: http://arxiv.org/abs/2205.12616

Zobrazit plný text záznamu

Report

Persistent-Transient Duality in Human Behavior Modeling

Autor: Tran, Hung, Le, Vuong, Venkatesh, Svetha, Tran, Truyen

We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transient channels that are initiated a

Externí odkaz: http://arxiv.org/abs/2204.09875

Zobrazit plný text záznamu

Report

A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research

Autor: Quinn, Thomas P, Gupta, Sunil, Venkatesh, Svetha, Le, Vuong

Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, ma

Externí odkaz: http://arxiv.org/abs/2110.08253

Zobrazit plný text záznamu

Report

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Autor: Dang, Long Hoang, Le, Thao Minh, Le, Vuong, Tran, Truyen

Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands

Externí odkaz: http://arxiv.org/abs/2106.13432

Zobrazit plný text záznamu

Report

A Spatio-temporal Attention-based Model for Infant Movement Assessment from Videos

Autor: Nguyen-Thai, Binh, Le, Vuong, Morgan, Catherine, Badawi, Nadia, Tran, Truyen, Venkatesh, Svetha

The absence or abnormality of fidgety movements of joints or limbs is strongly indicative of cerebral palsy in infants. Developing computer-based methods for assessing infant movements in videos is pivotal for improved cerebral palsy screening. Most

Externí odkaz: http://arxiv.org/abs/2105.09783

Zobrazit plný text záznamu

Report

Object-Centric Representation Learning for Video Question Answering

Autor: Dang, Long Hoang, Le, Thao Minh, Le, Vuong, Tran, Truyen

Video question answering (Video QA) presents a powerful testbed for human-like intelligent behaviors. The task demands new capabilities to integrate video processing, language understanding, binding abstract linguistic concepts to concrete visual art

Externí odkaz: http://arxiv.org/abs/2104.05166

Zobrazit plný text záznamu

Report

Learning Asynchronous and Sparse Human-Object Interaction in Videos

Autor: Morais, Romero, Le, Vuong, Venkatesh, Svetha, Tran, Truyen

Human activities can be learned from video. With effective modeling it is possible to discover not only the action labels but also the temporal structures of the activities such as the progression of the sub-activities. Automatically recognizing such

Externí odkaz: http://arxiv.org/abs/2103.02758

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání