Výsledky vyhledávání - "Yousefi, Midia"

Report

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Autor: Le, Chenyang, Qian, Yao, Wang, Dongmei, Zhou, Long, Liu, Shujie, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, Zhao, Sheng, Zeng, Michael

There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeli

Externí odkaz: http://arxiv.org/abs/2405.17809

Zobrazit plný text záznamu

Report

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Autor: Zhang, Leying, Qian, Yao, Zhou, Long, Liu, Shujie, Wang, Dongmei, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, He, Lei, Zhao, Sheng, Zeng, Michael

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a chal

Externí odkaz: http://arxiv.org/abs/2404.06690

Zobrazit plný text záznamu

Report

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Autor: Wang, Dongmei, Xiao, Xiong, Kanda, Naoyuki, Yousefi, Midia, Yoshioka, Takuya, Wu, Jian

Target-Speaker Voice Activity Detection (TS-VAD) utilizes a set of speaker profiles alongside an input audio signal to perform speaker diarization. While its superiority over conventional methods has been demonstrated, the method can suffer from erro

Externí odkaz: http://arxiv.org/abs/2309.12521

Zobrazit plný text záznamu

Report

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Autor: Yousefi, Midia, Hansen, John H. L.

The goal of speech separation is to extract multiple speech sources from a single microphone recording. Recently, with the advancement of deep learning and availability of large datasets, speech separation has been formulated as a supervised learning

Externí odkaz: http://arxiv.org/abs/2111.08635

Zobrazit plný text záznamu

Report

Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition

Autor: Yousefi, Midia, Hanse, John H. L.

This study addresses the problem of single-channel Automatic Speech Recognition of a target speaker within an overlap speech scenario. In the proposed method, the hidden representations in the acoustic model are modulated by speaker auxiliary informa

Externí odkaz: http://arxiv.org/abs/2111.00320

Zobrazit plný text záznamu

Report

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Autor: Yousefi, Midia, Hansen, John H. L.

Most current speech technology systems are designed to operate well even in the presence of multiple active speakers. However, most solutions assume that the number of co-current speakers is known. Unfortunately, this information might not always be

Externí odkaz: http://arxiv.org/abs/2111.00316

Zobrazit plný text záznamu

Report

Frame-based overlapping speech detection using Convolutional Neural Networks

Autor: Yousefi, Midia, Hansen, John H. L.

Naturalistic speech recordings usually contain speech signals from multiple speakers. This phenomenon can degrade the performance of speech technologies due to the complexity of tracing and recognizing individual speakers. In this study, we investiga

Externí odkaz: http://arxiv.org/abs/2001.09937

Zobrazit plný text záznamu

Report

Probabilistic Permutation Invariant Training for Speech Separation

Autor: Yousefi, Midia, Khorram, Soheil, Hansen, John H. L.

Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the m

Externí odkaz: http://arxiv.org/abs/1908.01768

Zobrazit plný text záznamu

Akademický článek

Single-channel speech separation using soft-minimum permutation invariant training

Autor: Yousefi, Midia, Hansen, John H.L.

Publikováno v: In Speech Communication June 2023 151:76-85

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání