Zobrazeno 1 - 10
of 49
pro vyhledávání: '"Dang Yonghao"'
Autor:
Tong, Xinyang, Ding, Pengxiang, Wang, Donglin, Zhang, Wenjie, Cui, Can, Sun, Mingyang, Fan, Yiguo, Zhao, Han, Zhang, Hongyin, Dang, Yonghao, Huang, Siteng, Lyu, Shangke
This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter reduction tec
Externí odkaz:
http://arxiv.org/abs/2412.15576
Real-time 2D keypoint detection plays an essential role in computer vision. Although CNN-based and Transformer-based methods have achieved breakthrough progress, they often fail to deliver superior performance and real-time speed. This paper introduc
Externí odkaz:
http://arxiv.org/abs/2412.01422
The deployment of embodied navigation agents in safety-critical environments raises concerns about their vulnerability to adversarial attacks on deep neural networks. However, current attack methods often lack practicality due to challenges in transi
Externí odkaz:
http://arxiv.org/abs/2409.10071
Previous methods usually only extract the image modality's information to recognize group activity. However, mining image information is approaching saturation, making it difficult to extract richer information. Therefore, extracting complementary in
Externí odkaz:
http://arxiv.org/abs/2407.19820
Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the sub
Externí odkaz:
http://arxiv.org/abs/2406.07918
Multi-person pose estimation (MPPE) presents a formidable yet crucial challenge in computer vision. Most existing methods predominantly concentrate on isolated interaction either between instances or joints, which is inadequate for scenarios demandin
Externí odkaz:
http://arxiv.org/abs/2404.14025
Recently, 2D convolution has been found unqualified in sound event detection (SED). It enforces translation equivariance on sound events along frequency axis, which is not a shift-invariant dimension. To address this issue, dynamic convolution is use
Externí odkaz:
http://arxiv.org/abs/2401.04976
Skeleton-based action recognition is a central task in human-computer interaction. However, most previous methods suffer from two issues: (i) semantic ambiguity arising from spatial-temporal information mixture; and (ii) overlooking the explicit expl
Externí odkaz:
http://arxiv.org/abs/2312.15144
Human Pose Estimation (HPE) plays a crucial role in computer vision applications. However, it is difficult to deploy state-of-the-art models on resouce-limited devices due to the high computational costs of the networks. In this work, a binary human
Externí odkaz:
http://arxiv.org/abs/2311.10296
Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. However, previous GCN-based methods rely on elaborate human priors excessively and construct complex feature aggregation mechanisms, which li
Externí odkaz:
http://arxiv.org/abs/2308.16018