Zobrazeno 1 - 10
of 195
pro vyhledávání: '"Sahidullah, Md"'
Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this st
Externí odkaz:
http://arxiv.org/abs/2410.00023
In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024. Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and mu
Externí odkaz:
http://arxiv.org/abs/2409.15356
Despite the promising performance of state of the art approaches for Parkinsons Disease (PD) detection, these approaches often analyze individual speech segments in isolation, which can lead to suboptimal results. Dysarthric cues that characterize sp
Externí odkaz:
http://arxiv.org/abs/2409.07884
Autor:
Wang, Xin, Delgado, Hector, Tak, Hemlata, Jung, Jee-weon, Shim, Hye-jin, Todisco, Massimiliano, Kukanov, Ivan, Liu, Xuechen, Sahidullah, Md, Kinnunen, Tomi, Evans, Nicholas, Lee, Kong Aik, Yamagishi, Junichi
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions. Compared to previous challenges, the ASVspoof 5 database is built from crowdsourced data
Externí odkaz:
http://arxiv.org/abs/2408.08739
Current trends in audio anti-spoofing detection research strive to improve models' ability to generalize across unseen attacks by learning to identify a variety of spoofing artifacts. This emphasis has primarily focused on the spoof class. Recently,
Externí odkaz:
http://arxiv.org/abs/2406.17246
Autor:
Singh, Vishwanath Pratap, Malato, Federico, Hautamaki, Ville, Sahidullah, Md., Kinnunen, Tomi
Publikováno v:
Interspeech 2024
While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmente
Externí odkaz:
http://arxiv.org/abs/2406.09999
The state-of-the-art audio deepfake detectors leveraging deep neural networks exhibit impressive recognition performance. Nonetheless, this advantage is accompanied by a significant carbon footprint. This is mainly due to the use of high-performance
Externí odkaz:
http://arxiv.org/abs/2403.14290
Autor:
Raghav, Nikhil, Sahidullah, Md
Clustering speaker embeddings is crucial in speaker diarization but hasn't received as much focus as other components. Moreover, the robustness of speaker diarization across various datasets hasn't been explored when the development and evaluation da
Externí odkaz:
http://arxiv.org/abs/2403.14286
The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children
Externí odkaz:
http://arxiv.org/abs/2402.15214
It is now well-known that automatic speaker verification (ASV) systems can be spoofed using various types of adversaries. The usual approach to counteract ASV systems against such attacks is to develop a separate spoofing countermeasure (CM) module t
Externí odkaz:
http://arxiv.org/abs/2401.11156