Zobrazeno 11 - 20
of 3 025
pro vyhledávání: '"Guo, Dan"'
This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to answer questions derived from untrimmed audible videos. To generate accurate answers, an AVQA model is expected to find the most informative audio-visual clues relevan
Externí odkaz:
http://arxiv.org/abs/2312.12816
Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability and reveal the imperceptible minor motion that contains valuable information in the macroscopic domain. However, challenges arise in this task d
Externí odkaz:
http://arxiv.org/abs/2312.04152
Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relatio
Externí odkaz:
http://arxiv.org/abs/2310.09147
Make-up temporal video grounding (MTVG) aims to localize the target video segment which is semantically related to a sentence describing a make-up activity, given a long video. Compared with the general video grounding task, MTVG focuses on meticulou
Externí odkaz:
http://arxiv.org/abs/2309.06176
In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation aims to predict the continuous arousal and valence values of a participant bas
Externí odkaz:
http://arxiv.org/abs/2308.13421
Remote photoplethysmography (rPPG) based physiological measurement is an emerging yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from facial videos accompanied by noises of illumination variations, facial occlusio
Externí odkaz:
http://arxiv.org/abs/2308.07771
Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike
Externí odkaz:
http://arxiv.org/abs/2308.06420
The video grounding (VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in complex interaction between video and query, overemphasizing cross-mod
Externí odkaz:
http://arxiv.org/abs/2308.06009
In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023. The solution covers three sub-challenges: bodily behavior recognition, eye contact detection, and next speaker prediction. W
Externí odkaz:
http://arxiv.org/abs/2308.01526
In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023. The micro-gesture classification task aims at recognizing the action category of a given video based on t
Externí odkaz:
http://arxiv.org/abs/2307.10624