Popis: |
The unprecedented success of speech recognition methods has stimulated the wide usage of intelligent audio systems, which provides new attack opportunities for stealing the user privacy through eavesdropping on the loudspeakers. Effective eavesdropping methods employ a high-speed camera, relying on LOS to measure object vibrations, or utilize WiFi MIMO antenna array, requiring to eavesdrop in quiet environments. In this paper, we explore the possibility of eavesdropping on the loudspeaker based on COTS RFID tags, which are prevalently deployed in many corners of our daily lives. We propose Tag-Bug that focuses on the human voice with complex frequency bands and performs the thru-the-wall eavesdropping on the loudspeaker by capturing sub-mm level vibration. Tag-Bug extracts sound characteristics through two means: (1) Vibration effect, where a tag directly vibrates caused by sounds; (2) Reflection effect, where a tag does not vibrate but senses the reflection signals from nearby vibrating objects. To amplify the influence of vibration signals, we design a new signal feature referred as Modulated Signal Difference (MSD) to reconstruct the sound from RF-signals. To improve the quality of the reconstructed sound for human voice recognition, we apply a Conditional Generative Adversarial Network (CGAN) to recover the full-frequency band from the partial-frequency band of the reconstructed sound. Extensive experiments on the USRP platform show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB. Tag-Bug can efficiently recognize the numbers of human voice with 95.3%, 85.3% and 87.5% precision in the free-space eavesdropping, thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping, respectively. Tag-Bug can also accurately recognize the letters with 87% precision in the free-space eavesdropping. |