Multi-Frequency RF Sensor Fusion for Word-Level Fluent ASL Recognition
Autor: | Ali Cafer Gurbuz, Evie Malaia, Emre Kurtoglu, Darrin J. Griffin, Chris S. Crawford, M. Mahbubur Rahman, Sevgi Zubeyde Gurbuz |
---|---|
Rok vydání: | 2022 |
Předmět: |
American Sign Language
Computer science business.industry Deep learning Speech recognition Sensor fusion language.human_language Gesture recognition language Feature (machine learning) ComputingMilieux_COMPUTERSANDSOCIETY Visual communication Artificial intelligence Electrical and Electronic Engineering business Instrumentation Wireless sensor network Gesture |
Zdroj: | IEEE Sensors Journal. 22:11373-11381 |
ISSN: | 2379-9153 1530-437X |
DOI: | 10.1109/jsen.2021.3078339 |
Popis: | Deaf spaces are unique indoor environments designed to optimize visual communication and Deaf cultural expression. However, much of the technological research geared towards the deaf involve use of video or wearables for American sign language (ASL) translation, with little consideration for Deaf perspective on privacy and usability of the technology. In contrast to video, RF sensors offer the avenue for ambient ASL recognition while also preserving privacy for Deaf signers. Methods: This paper investigates the RF transmit waveform parameters required for effective measurement of ASL signs and their effect on word-level classification accuracy attained with transfer learning and convolutional autoencoders (CAE). A multi-frequency fusion network is proposed to exploit data from all sensors in an RF sensor network and improve the recognition accuracy of fluent ASL signing. Results: For fluent signers, CAEs yield a 20-sign classification accuracy of %76 at 77 GHz and %73 at 24 GHz, while at X-band (10 Ghz) accuracy drops to 67%. For hearing imitation signers, signs are more separable, resulting in a 96% accuracy with CAEs. Further, fluent ASL recognition accuracy is significantly increased with use of the multi-frequency fusion network, which boosts the 20-sign fluent ASL recognition accuracy to 95%, surpassing conventional feature level fusion by 12%. Implications: Signing involves finer spatiotemporal dynamics than typical hand gestures, and thus requires interrogation with a transmit waveform that has a rapid succession of pulses and high bandwidth. Millimeter wave RF frequencies also yield greater accuracy due to the increased Doppler spread of the radar backscatter. Comparative analysis of articulation dynamics also shows that imitation signing is not representative of fluent signing, and not effective in pre-training networks for fluent ASL classification. Deep neural networks employing multi-frequency fusion capture both shared, as well as sensor-specific features and thus offer significant performance gains in comparison to using a single sensor or feature-level fusion. |
Databáze: | OpenAIRE |
Externí odkaz: |