Výsledky vyhledávání

Report

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Autor: Mu, Zhaoxi, Yang, Xinyu

The integration of visual cues has revitalized the performance of the target speech extraction task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm often encounters the challenge of modality imbalance. In

Externí odkaz: http://arxiv.org/abs/2404.12725

Zobrazit plný text záznamu

Report

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Autor: Mu, Zhaoxi, Yang, Xinyu, Sun, Sining, Yang, Qing

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference

Externí odkaz: http://arxiv.org/abs/2312.10305

Zobrazit plný text záznamu

Report

Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Autor: Mu, Zhaoxi, Yang, Xinyu, Zhu, Wenjing

Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this p

Externí odkaz: http://arxiv.org/abs/2303.03737

Zobrazit plný text záznamu

Report

A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

Autor: Mu, Zhaoxi, Yang, Xinyu, Yang, Xiangyuan, Zhu, Wenjing

In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage en

Externí odkaz: http://arxiv.org/abs/2303.03732

Zobrazit plný text záznamu

Report

Review of end-to-end speech synthesis technology based on deep learning

Autor: Mu, Zhaoxi, Yang, Xinyu, Dong, Yizhuo

As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output of intelligent machine more easily and intuitively, thus has attracted more and more attention. Due to the limitations of hig

Externí odkaz: http://arxiv.org/abs/2104.09995

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání