Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Shi, Yanpei"'
Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local
Externí odkaz:
http://arxiv.org/abs/2410.07277
Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. This paper proposes a hierarchical network with transformer encoders and memory mechanism to address this problem. The proposed model conta
Externí odkaz:
http://arxiv.org/abs/2010.16071
Many-to-many voice conversion with non-parallel training data has seen significant progress in recent years. StarGAN-based models have been interests of voice conversion. However, most of the StarGAN-based methods only focused on voice conversion exp
Externí odkaz:
http://arxiv.org/abs/2010.11646
While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performan
Externí odkaz:
http://arxiv.org/abs/2005.07818
Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of a hierarc
Externí odkaz:
http://arxiv.org/abs/2005.07817
Autor:
Shi, Yanpei, Hain, Thomas
Separating different speaker properties from a multi-speaker environment is challenging. Instead of separating a two-speaker signal in signal space like speech source separation, a speaker embedding de-mixing approach is proposed. The proposed approa
Externí odkaz:
http://arxiv.org/abs/2001.06397
In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually
Externí odkaz:
http://arxiv.org/abs/2001.05031
In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use of hierar
Externí odkaz:
http://arxiv.org/abs/1910.07900
Autor:
Shi, Yanpei, Hain, Thomas
Embedding acoustic information into fixed length representations is of interest for a whole range of applications in speech and audio technology. Two novel unsupervised approaches to generate acoustic embeddings by modelling of acoustic context are p
Externí odkaz:
http://arxiv.org/abs/1910.07601
While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. To improve robustness of speaker recognition system performance in noise, a n
Externí odkaz:
http://arxiv.org/abs/1909.11200