Zobrazeno 1 - 10
of 943
pro vyhledávání: '"Liu, Jinxiang"'
Autor:
Cheng, Haozhe, Ju, Cheng, Wang, Haicheng, Liu, Jinxiang, Chen, Mengting, Hu, Qiang, Zhang, Xiaoyun, Wang, Yanfeng
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing me
Externí odkaz:
http://arxiv.org/abs/2404.14890
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames. Although great progress has been witnessed, we experimentally reveal that current methods reach marginal performance gain within the use of the unlabeled frames, le
Externí odkaz:
http://arxiv.org/abs/2403.11074
Publikováno v:
Cailiao gongcheng, Vol 52, Iss 9, Pp 158-168 (2024)
Aimed at the problem of environmental uranium pollution caused by uranium-containing wastewater generated in the process of nuclear energy development and application, by loading g-C3N4 on chitosan (CS) through the cross-linking method, g-C3N
Externí odkaz:
https://doaj.org/article/84fa54ce5fe045d8aad79adff6523959
The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues. However, current fusion-based methods have the performance limitations due to the small receptive field of convolution and i
Externí odkaz:
http://arxiv.org/abs/2307.13236
The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks. To tackle the task, it involves a comprehensive consideration of both the data and model
Externí odkaz:
http://arxiv.org/abs/2305.11019
Publikováno v:
Industrial Lubrication and Tribology, 2024, Vol. 76, Issue 5, pp. 703-715.
Externí odkaz:
http://www.emeraldinsight.com/doi/10.1108/ILT-12-2023-0417
Autor:
Ma, Chaofan, Yang, Yuhuan, Ju, Chen, Zhang, Fei, Liu, Jinxiang, Wang, Yu, Zhang, Ya, Wang, Yanfeng
Learning from a large corpus of data, pre-trained models have achieved impressive progress nowadays. As popular generative pre-training, diffusion models capture both low-level visual knowledge and high-level semantic relations. In this paper, we pro
Externí odkaz:
http://arxiv.org/abs/2303.09813
Autor:
Ju, Chen, Wang, Haicheng, Liu, Jinxiang, Ma, Chaofan, Zhang, Ya, Zhao, Peisen, Chang, Jianlong, Tian, Qi
Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos. The existing fully-supervised setting achieves great performance but requires expensive annotation costs; while the w
Externí odkaz:
http://arxiv.org/abs/2302.09850
Autor:
Ju, Chen, Zheng, Kunhao, Liu, Jinxiang, Zhao, Peisen, Zhang, Ya, Chang, Jianlong, Wang, Yanfeng, Tian, Qi
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action instances with only category labels. Most methods widely adopt the off-the-shelf Classification-Based Pre-training (CBP) to generate video features for action
Externí odkaz:
http://arxiv.org/abs/2212.09335
We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. To understand what enables to learn useful representations, we systematically investigate the effects of dat
Externí odkaz:
http://arxiv.org/abs/2206.12772