Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Mu, Zhaoxi"'
Autor:
Mu, Zhaoxi, Yang, Xinyu
The integration of visual cues has revitalized the performance of the target speech extraction task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm often encounters the challenge of modality imbalance. In
Externí odkaz:
http://arxiv.org/abs/2404.12725
Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference
Externí odkaz:
http://arxiv.org/abs/2312.10305
Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this p
Externí odkaz:
http://arxiv.org/abs/2303.03737
In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage en
Externí odkaz:
http://arxiv.org/abs/2303.03732
As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output of intelligent machine more easily and intuitively, thus has attracted more and more attention. Due to the limitations of hig
Externí odkaz:
http://arxiv.org/abs/2104.09995