Abstrakt: |
Hand Gesture Recognition is receiving enormous attention because they facilitate communication for various applications, including human-computer interaction. However, the HGR system poses various challenges due to environmental conditions, rotation, scaling, illumination variations, etc. This paper proposes a lightweight CNN based portable network SpAtNet: a spatial feature attention network that learns spatial features for precise hand gesture recognition. SpAtNet primarily consists of two blocks: multi-scale attentive feature fusion (MAFF) and interleaved module. The MAFF block employs multi-scale filters: 1 × 1 , 3 × 3 , 5 × 5 to extract the rich spatial information, which improves the robustness of the HGR system. The MAFF block encodes features with smaller scale utilizing small filters while a larger filter extracts coarse features. The interleaved module is designed by sequentially stacking four convolutional layers with kernel sizes: 3 × 3 and 5 × 5 . The interleaved module is introduced to learn the high-level contextual features crucial for efficient recognition. The proposed algorithm is validated on six benchmark datasets: MUGD, ASL Finger Spelling, NUS-II, HGR-I, Triesch and ArASL. The comparative analysis and visual representation show that the proposed approach outperforms the other state-of-art techniques. [ABSTRACT FROM AUTHOR] |