Multi-scale Information Aggregation for Spoofing Detection

Autor:	Changtao Li, Yi Wan, Feiran Yang, Jun Yang
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Voice anti-spoofing Convolutional neural network Deep learning Information aggregation Deep fake detection Acoustics. Sound QC221-246 Electronic computers. Computer science QA75.5-76.95
Zdroj:	EURASIP Journal on Audio, Speech, and Music Processing, Vol 2024, Iss 1, Pp 1-11 (2024)
Druh dokumentu:	article
ISSN:	1687-4722
DOI:	10.1186/s13636-024-00379-x
Popis:	Abstract Synthesis artifacts that span scales from small to large are important cues for spoofing detection. However, few spoofing detection models leverage artifacts across different scales together. In this paper, we propose a spoofing detection system built on SincNet and Deep Layer Aggregation (DLA), which leverages speech representations at different levels to distinguish synthetic speech. DLA is totally convolutional with an iterative tree-like structure. The unique topology of DLA makes possible compounding of speech features from convolution layers at different depths, and therefore the local and the global speech representations can be incorporated simultaneously. Moreover, SincNet is employed as the frontend feature extractor to circumvent manual feature extraction and selection. SincNet can learn fine-grained features directly from the input speech waveform, thus making the proposed spoofing detection system end-to-end. The proposed system outperforms the baselines when tested on ASVspoof LA and DF datasets. Notably, our single model surpasses all competing systems in ASVspoof DF competition with an equal error rate (EER) of 13.99%, which demonstrates the importance of multi-scale information aggregation for synthetic speech detection.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/ea21ed20a568450594125710d075de29 Zobrazit plný text záznamu Full text from SpringerLink View record in DOAJ