Autor: |
Gui, Suying, Zhou, Chuan, Wang, Hao, Gao, Tiegang |
Předmět: |
|
Zdroj: |
Electronics (2079-9292); Aug2023, Vol. 12 Issue 15, p3309, 13p |
Abstrakt: |
With the rapid development of big data, artificial intelligence, and Internet technologies, human–human contact and human–machine interaction have led to an explosion of voice data. Rapidly identifying the speaker's identity and retrieving and managing their speech data among the massive amount of speech data have become major challenges for intelligent speech applications in the field of information security. This research proposes a vocal recognition technique based on information adversarial training for speaker identity recognition in massive audio and video data, as well as speaker identification when oriented to the information security domain. The experimental results show that the method projects data from different scene channels all onto the same space and dynamically generates interactive speaker representations. It solves the channel mismatch problem and effectively improves the recognition of the speaker's voice patterns across channels and scenes. It is able to separate overlapping voices when multiple people speak at the same time and reduce speaker separation errors. It realizes speaker voice recognition for the information security field and achieves a recall rate of 89% in a large database, which is of practical value for the intelligent application field. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|