Zobrazeno 1 - 10
of 156
pro vyhledávání: '"Ban Chao"'
Autor:
Chen, Jin, Ma, Kaijing, Huang, Haojian, Shen, Jiayu, Fang, Han, Zang, Xianghao, Ban, Chao, He, Zhongjiang, Sun, Hao, Kang, Yanmei
The development of multi-modal models has been rapidly advancing, with some demonstrating remarkable capabilities. However, annotating video-text pairs remains expensive and insufficient. Take video question answering (VideoQA) tasks as an example, h
Externí odkaz:
http://arxiv.org/abs/2410.02768
Autor:
Huang, Haojian, Qin, Chuanyu, Liu, Zhe, Ma, Kaijing, Chen, Jin, Fang, Han, Ban, Chao, Sun, Hao, He, Zhongjiang
Multi-view classification (MVC) faces inherent challenges due to domain gaps and inconsistencies across different views, often resulting in uncertainties during the fusion process. While Evidential Deep Learning (EDL) has been effective in addressing
Externí odkaz:
http://arxiv.org/abs/2409.00755
Autor:
Ma, Kaijing, Huang, Haojian, Chen, Jin, Chen, Haodong, Ji, Pengliang, Zang, Xianghao, Fang, Han, Ban, Chao, Sun, Hao, Chen, Mulin, Li, Xuelong
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. This leads to unreliable predictions for noisy, corrupted, and out-of-distribution data.
Externí odkaz:
http://arxiv.org/abs/2408.16272
Autor:
Ma, Kaijing, Fang, Han, Zang, Xianghao, Ban, Chao, Zhou, Lanxiang, He, Zhongjiang, Li, Yongxiang, Sun, Hao, Feng, Zerun, Hou, Xingsong
Video Moment Retrieval, which aims to locate in-context video moments according to a natural language query, is an essential task for cross-modal grounding. Existing methods focus on enhancing the cross-modal interactions between all moments and the
Externí odkaz:
http://arxiv.org/abs/2408.07600
Autor:
Fang, Han, Zang, Xianghao, Ban, Chao, Feng, Zerun, Zhou, Lanxiang, He, Zhongjiang, Li, Yongxiang, Sun, Hao
Text-video retrieval aims to find the most relevant cross-modal samples for a given query. Recent methods focus on modeling the whole spatial-temporal relations. However, since video clips contain more diverse content than captions, the model alignin
Externí odkaz:
http://arxiv.org/abs/2404.12216
Recently, masked video modeling has been widely explored and significantly improved the model's understanding ability of visual regions at a local level. However, existing methods usually adopt random masking and follow the same reconstruction paradi
Externí odkaz:
http://arxiv.org/abs/2305.07910
Publikováno v:
In Computers and Electronics in Agriculture August 2024 223
Video affective understanding, which aims to predict the evoked expressions by the video content, is desired for video creation and recommendation. In the recent EEV challenge, a dense affective understanding task is proposed and requires frame-level
Externí odkaz:
http://arxiv.org/abs/2106.09964
Autor:
Ban, Chao1 (AUTHOR) ban_chao@outlook.com, Tian, Xingzhou2,3 (AUTHOR) qlu@gzu.edu.cn, Lu, Qi2,3 (AUTHOR), Lounglawan, Pipat1 (AUTHOR) pipat@sut.ac.th, Wen, Guilan2 (AUTHOR) pipat@sut.ac.th
Publikováno v:
Animals (2076-2615). Apr2024, Vol. 14 Issue 8, p1139. 17p.
Autor:
Wang, Peng, He, Jinlong, Ma, Xueying, Weng, Lixin, Wu, Qiong, Zhao, Pengfei, Ban, Chao, Hao, Xiangcheng, Hao, Zhiyue, Yuan, Pengxuan, Hao, Fene, Wang, Shaoyu, Zhang, Huapeng, Xie, Shenghui, Gao, Yang
Publikováno v:
In Academic Radiology July 2023 30(7):1238-1246