Výsledky vyhledávání

Report

Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering

Autor: Li, Zhangbin, Guo, Dan, Zhou, Jinxing, Zhang, Jing, Wang, Meng

This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to answer questions derived from untrimmed audible videos. To generate accurate answers, an AVQA model is expected to find the most informative audio-visual clues relevan

Externí odkaz: http://arxiv.org/abs/2312.12816

Zobrazit plný text záznamu

Report

EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer

Autor: Wang, Fei, Guo, Dan, Li, Kun, Wang, Meng

Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability and reveal the imperceptible minor motion that contains valuable information in the macroscopic domain. However, challenges arise in this task d

Externí odkaz: http://arxiv.org/abs/2312.04152

Zobrazit plný text záznamu

Report

Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

Autor: Zhou, Sheng, Guo, Dan, Li, Jia, Yang, Xun, Wang, Meng

Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relatio

Externí odkaz: http://arxiv.org/abs/2310.09147

Zobrazit plný text záznamu

Report

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

Autor: Li, Jiaxiu, Li, Kun, Li, Jia, Chen, Guoliang, Guo, Dan, Wang, Meng

Make-up temporal video grounding (MTVG) aims to localize the target video segment which is semantically related to a sentence describing a make-up activity, given a long video. Compared with the general video grounding task, MTVG focuses on meticulou

Externí odkaz: http://arxiv.org/abs/2309.06176

Zobrazit plný text záznamu

Report

Exploiting Diverse Feature for Multimodal Sentiment Analysis

Autor: Li, Jia, Qian, Wei, Li, Kun, Li, Qi, Guo, Dan, Wang, Meng

In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation aims to predict the continuous arousal and valence values of a participant bas

Externí odkaz: http://arxiv.org/abs/2308.13421

Zobrazit plný text záznamu

Report

Dual-path TokenLearner for Remote Photoplethysmography-based Physiological Measurement with Facial Videos

Autor: Qian, Wei, Guo, Dan, Li, Kun, Tian, Xilan, Wang, Meng

Remote photoplethysmography (rPPG) based physiological measurement is an emerging yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from facial videos accompanied by noises of illumination variations, facial occlusio

Externí odkaz: http://arxiv.org/abs/2308.07771

Zobrazit plný text záznamu

Report

M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

Autor: Vu, Yen Nhi Truong, Guo, Dan, Taha, Ahmed, Su, Jason, Matthews, Thomas Paul

Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike

Externí odkaz: http://arxiv.org/abs/2308.06420

Zobrazit plný text záznamu

Report

ViGT: Proposal-free Video Grounding with Learnable Token in Transformer

Autor: Li, Kun, Guo, Dan, Wang, Meng

The video grounding (VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in complex interaction between video and query, overemphasizing cross-mod

Externí odkaz: http://arxiv.org/abs/2308.06009

Zobrazit plný text záznamu

Report

Data Augmentation for Human Behavior Analysis in Multi-Person Conversations

Autor: Li, Kun, Guo, Dan, Chen, Guoliang, Liu, Feiyang, Wang, Meng

In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023. The solution covers three sub-challenges: bodily behavior recognition, eye contact detection, and next speaker prediction. W

Externí odkaz: http://arxiv.org/abs/2308.01526

Zobrazit plný text záznamu

Report

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification

Autor: Li, Kun, Guo, Dan, Chen, Guoliang, Peng, Xinge, Wang, Meng

In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023. The micro-gesture classification task aims at recognizing the action category of a given video based on t

Externí odkaz: http://arxiv.org/abs/2307.10624

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání