Zobrazeno 1 - 10
of 287
pro vyhledávání: '"Alwan, Abeer A."'
Recently, speech foundation models have gained popularity due to their superiority in finetuning downstream ASR tasks. However, models finetuned on certain domains, such as LibriSpeech (adult read speech), behave poorly on other domains (child or noi
Externí odkaz:
http://arxiv.org/abs/2406.10512
Speech foundation models (SFMs) have achieved state-of-the-art results for various speech tasks in supervised (e.g. Whisper) or self-supervised systems (e.g. WavLM). However, the performance of SFMs for child ASR has not been systematically studied.
Externí odkaz:
http://arxiv.org/abs/2406.10507
Non-autoregressive automatic speech recognition (NASR) models have gained attention due to their parallelism and fast inference. The encoder-based NASR, e.g. connectionist temporal classification (CTC), can be initialized from the speech foundation m
Externí odkaz:
http://arxiv.org/abs/2402.08898
While speech-based depression detection methods that use speaker-identity features, such as speaker embeddings, are popular, they often compromise patient privacy. To address this issue, we propose a speaker disentanglement method that utilizes a non
Externí odkaz:
http://arxiv.org/abs/2306.01861
Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods include autoregressive predictive coding (APC), Wav2vec2.0, and hidden unit B
Externí odkaz:
http://arxiv.org/abs/2305.00115
Recently, end-to-end models have been widely used in automatic speech recognition (ASR) systems. Two of the most representative approaches are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. Autoregressiv
Externí odkaz:
http://arxiv.org/abs/2304.07611
Autor:
Afshan, Amber, Alwan, Abeer
Our prior experiments show that humans and machines seem to employ different approaches to speaker discrimination, especially in the presence of speaking style variability. The experiments examined read versus conversational speech. Listeners focused
Externí odkaz:
http://arxiv.org/abs/2206.13684
Attention-based conditioning methods using variable frame rate for style-robust speaker verification
Autor:
Afshan, Amber, Alwan, Abeer
We propose an approach to extract speaker embeddings that are robust to speaking style variations in text-independent speaker verification. Typically, speaker embedding extraction includes training a DNN for speaker classification and using the bottl
Externí odkaz:
http://arxiv.org/abs/2206.13680
Major Depressive Disorder (MDD) is a severe illness that affects millions of people, and it is critical to diagnose this disorder as early as possible. Detecting depression from voice signals can be of great help to physicians and can be done without
Externí odkaz:
http://arxiv.org/abs/2206.13016
Preserving a patient's identity is a challenge for automatic, speech-based diagnosis of mental health disorders. In this paper, we address this issue by proposing adversarial disentanglement of depression characteristics and speaker identity. The mod
Externí odkaz:
http://arxiv.org/abs/2206.09530