Výsledky vyhledávání

Report

Scene-Text Grounding for Text-Based Video Question Answering

Autor: Zhou, Sheng, Xiao, Junbin, Yang, Xun, Song, Peipei, Guo, Dan, Yao, Angela, Wang, Meng, Chua, Tat-Seng

Existing efforts in text-based video question answering (TextVideoQA) are criticized for their opaque decisionmaking and heavy reliance on scene-text recognition. In this paper, we propose to study Grounded TextVideoQA by forcing models to answer que

Externí odkaz: http://arxiv.org/abs/2409.14319

Zobrazit plný text záznamu

Report

Prototype Learning for Micro-gesture Classification

Autor: Chen, Guoliang, Wang, Fei, Li, Kun, Wu, Zhiliang, Fan, Hehe, Yang, Yi, Wang, Meng, Guo, Dan

In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024. The task of micro-gesture classification task involves recognizing the category of a

Externí odkaz: http://arxiv.org/abs/2408.03097

Zobrazit plný text záznamu

Report

Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

Autor: Zhou, Jinxing, Guo, Dan, Mao, Yuxin, Zhong, Yiran, Chang, Xiaojun, Wang, Meng

Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improvin

Externí odkaz: http://arxiv.org/abs/2407.08126

Zobrazit plný text záznamu

Report

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

Autor: Hu, Jinpeng, Dong, Tengteng, Gang, Luo, Ma, Hui, Zou, Peng, Sun, Xiao, Guo, Dan, Wang, Meng

Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers

Externí odkaz: http://arxiv.org/abs/2407.05721

Zobrazit plný text záznamu

Report

MMAD: Multi-label Micro-Action Detection in Videos

Autor: Li, Kun, Guo, Dan, Liu, Pengyu, Chen, Guoliang, Wang, Meng

Human body actions are an important form of non-verbal communication in social interactions. This paper focuses on a specific subset of body actions known as micro-actions, which are subtle, low-intensity body movements that provide a deeper understa

Externí odkaz: http://arxiv.org/abs/2407.05311

Zobrazit plný text záznamu

Report

Micro-gesture Online Recognition using Learnable Query Points

Autor: Liu, Pengyu, Wang, Fei, Li, Kun, Chen, Guoliang, Wei, Yanyan, Tang, Shengeng, Wu, Zhiliang, Guo, Dan

In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the Micro-gesture Online Recognition track in the MiGA challenge at IJCAI 2024. The Micro-gesture Online Recognition task involves identifying the category and loca

Externí odkaz: http://arxiv.org/abs/2407.04490

Zobrazit plný text záznamu

Report

Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement

Autor: Qian, Wei, Li, Qi, Li, Kun, Wang, Xinke, Sun, Xiao, Wang, Meng, Guo, Dan

This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024. The goal is to deve

Externí odkaz: http://arxiv.org/abs/2406.04942

Zobrazit plný text záznamu

Report

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

Autor: Zhou, Jinxing, Guo, Dan, Zhong, Yiran, Wang, Meng

The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are pr

Externí odkaz: http://arxiv.org/abs/2406.00919

Zobrazit plný text záznamu

Report

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Autor: Ren, Bin, Li, Yawei, Mehta, Nancy, Timofte, Radu, Yu, Hongyuan, Wan, Cheng, Hong, Yuxin, Han, Bingnan, Wu, Zhuoyuan, Zou, Yajun, Liu, Yuqing, Li, Jizhe, He, Keji, Fan, Chao, Zhang, Heng, Zhang, Xiaolin, Yin, Xuanwu, Zuo, Kunlong, Liao, Bohao, Xia, Peizhe, Peng, Long, Du, Zhibo, Di, Xin, Li, Wangkai, Wang, Yang, Zhai, Wei, Pei, Renjing, Guo, Jiaming, Xu, Songcen, Cao, Yang, Zha, Zhengjun, Wang, Yan, Liu, Yi, Wang, Qing, Zhang, Gang, Zhang, Liou, Zhao, Shijie, Sun, Long, Pan, Jinshan, Dong, Jiangxin, Tang, Jinhui, Liu, Xin, Yan, Min, Wang, Qian, Zhou, Menghan, Yan, Yiqiang, Liu, Yixuan, Chan, Wensong, Tang, Dehua, Zhou, Dong, Wang, Li, Tian, Lu, Emad, Barsoum, Jia, Bohan, Qiao, Junbo, Zhou, Yunshuai, Zhang, Yun, Li, Wei, Lin, Shaohui, Zhou, Shenglong, Chen, Binbin, Liao, Jincheng, Zhao, Suiyi, Zhang, Zhao, Wang, Bo, Luo, Yan, Wei, Yanyan, Li, Feng, Wang, Mingshen, Guan, Jinhan, Hu, Dehua, Yu, Jiawei, Xu, Qisheng, Sun, Tao, Lan, Long, Xu, Kele, Lin, Xin, Yue, Jingtong, Yang, Lehan, Du, Shiyi, Qi, Lu, Ren, Chao, Han, Zeyu, Wang, Yuhan, Chen, Chaolin, Li, Haobo, Zheng, Mingjun, Yang, Zhongbao, Song, Lianhong, Yan, Xingzhuo, Fu, Minghan, Zhang, Jingyi, Li, Baiang, Zhu, Qi, Xu, Xiaogang, Guo, Dan, Guo, Chunle, Chen, Jiadi, Long, Huanhuan, Duanmu, Chunjiang, Lei, Xiaoyan, Liu, Jie, Jia, Weilin, Cao, Weifeng, Zhang, Wenlong, Mao, Yanyu, Guo, Ruilong, Zhang, Nihao, Pandey, Manoj, Chernozhukov, Maksym, Le, Giang, Cheng, Shuli, Wang, Hongyuan, Wei, Ziyan, Tang, Qingting, Wang, Liejun, Li, Yongming, Guo, Yanhui, Xu, Hao, Khatami-Rizi, Akram, Mahmoudi-Aznaveh, Ahmad, Hsu, Chih-Chung, Lee, Chia-Ming, Chou, Yi-Shiuan, Joshi, Amogh, Akalwadi, Nikhil, Malagi, Sampada, Yashaswini, Palani, Desai, Chaitra, Tabib, Ramesh Ashok, Patil, Ujwala, Mudenagudi, Uma

This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor

Externí odkaz: http://arxiv.org/abs/2404.10343

Zobrazit plný text záznamu

Report

Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

Autor: Hu, Jingjing, Guo, Dan, Li, Kun, Si, Zhan, Yang, Xun, Chang, Xiaojun, Wang, Meng

Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-mo

Externí odkaz: http://arxiv.org/abs/2403.14174

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání