Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Zhou, Benjia"'
Autor:
Chen, Zhigang, Zhou, Benjia, Huang, Yiqing, Wan, Jun, Hu, Yibo, Shi, Hailin, Liang, Yanyan, Lei, Zhen, Zhang, Du
Sign Language Representation Learning (SLRL) is crucial for a range of sign language-related downstream tasks such as Sign Language Translation (SLT) and Sign Language Retrieval (SLRet). Recently, many gloss-based and gloss-free SLRL methods have bee
Externí odkaz:
http://arxiv.org/abs/2408.09949
Autor:
Chen, Zhigang, Zhou, Benjia, Li, Jun, Wan, Jun, Lei, Zhen, Jiang, Ning, Lu, Quan, Zhao, Guoqing
Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches wor
Externí odkaz:
http://arxiv.org/abs/2403.12556
Autor:
Han, Tianshun, Gui, Shengnan, Huang, Yiqing, Li, Baihui, Liu, Lijian, Zhou, Benjia, Jiang, Ning, Lu, Quan, Zhi, Ruicong, Liang, Yanyan, Zhang, Du, Wan, Jun
Speech-driven 3D facial animation has improved a lot recently while most related works only utilize acoustic modality and neglect the influence of visual and textual cues, leading to unsatisfactory results in terms of precision and coherence. We argu
Externí odkaz:
http://arxiv.org/abs/2312.02781
RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion. Although many RGB-D based action and gesture recognition approach
Externí odkaz:
http://arxiv.org/abs/2308.12006
Autor:
Zhou, Benjia, Chen, Zhigang, Clapés, Albert, Wan, Jun, Liang, Yanyan, Escalera, Sergio, Lei, Zhen, Zhang, Du
Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. Many previous methods employ an intermediate representation, i.e., gloss sequences, to facilitate SLT
Externí odkaz:
http://arxiv.org/abs/2307.14768
Motion recognition is a promising direction in computer vision, but the training of video classification models is much harder than images due to insufficient data and considerable parameters. To get around this, some works strive to explore multimod
Externí odkaz:
http://arxiv.org/abs/2211.09146
Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs), but the training of ViTs is much harder than CNNs. In this paper, we define several metrics, including Dynamic Data Proportion (DDP) and K
Externí odkaz:
http://arxiv.org/abs/2209.15006
Autor:
Zhou, Benjia, Wang, Pichao, Wan, Jun, Liang, Yanyan, Wang, Fan, Zhang, Du, Lei, Zhen, Li, Hao, Jin, Rong
Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors. Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coup
Externí odkaz:
http://arxiv.org/abs/2112.09129
Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, f
Externí odkaz:
http://arxiv.org/abs/2102.05348
Vehicle Re-identification (ReID) is an important yet challenging problem in computer vision. Compared to other visual objects like faces and persons, vehicles simultaneously exhibit much larger intraclass viewpoint variations and interclass visual si
Externí odkaz:
http://arxiv.org/abs/2011.06228