Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation
Autor: | Houqiang Li, Hao Zhou, Wengang Zhou, Yun Zhou |
---|---|
Rok vydání: | 2022 |
Předmět: |
Facial expression
business.industry Computer science Speech recognition Deep learning ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 02 engineering and technology Sign language Computer Science Applications Rule-based machine translation Discriminative model Signal Processing 0202 electrical engineering electronic engineering information engineering Media Technology 020201 artificial intelligence & image processing Sequence learning Artificial intelligence Electrical and Electronic Engineering business Pose Sensory cue |
Zdroj: | IEEE Transactions on Multimedia. 24:768-779 |
ISSN: | 1941-0077 1520-9210 |
DOI: | 10.1109/tmm.2021.3059098 |
Popis: | Despite the recent success of deep learning in video-related tasks, deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars in sign videos behind the collaboration of different visual cues (i.e., hand shape, facial expression and body posture). To this end, we approach video-based sign language understanding with multi-cue learning and propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module learns to spatial representation of different cues with a self-contained pose estimation branch. The TMC module models temporal corrections from intra-cue and inter-cue perspectives to explore the collaboration of multiple cues. A joint optimization strategy and a segmented attention mechanism are designed to make the best of multi-cue sources for SL recognition and translation. To validate the effectiveness, we perform experiments on three large-scale sign language benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks. |
Databáze: | OpenAIRE |
Externí odkaz: |