Výsledky vyhledávání - "Wan, Xucheng"

Report

XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

Autor: Wan, Xucheng, Zheng, Naijun, Liu, Kai, Zhou, Huan

Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code

Externí odkaz: http://arxiv.org/abs/2408.10524

Zobrazit plný text záznamu

Report

An efficient text augmentation approach for contextualized Mandarin speech recognition

Autor: Zheng, Naijun, Wan, Xucheng, Liu, Kai, Du, Ziqing, Huan, Zhou

Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge

Externí odkaz: http://arxiv.org/abs/2406.09950

Zobrazit plný text záznamu

Report

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

Autor: Mu, Bingshen, Li, Yangze, Shao, Qijie, Wei, Kun, Wan, Xucheng, Zheng, Naijun, Zhou, Huan, Xie, Lei

Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models

Externí odkaz: http://arxiv.org/abs/2405.03152

Zobrazit plný text záznamu

Report

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Autor: Wang, He, Guo, Pengcheng, Wan, Xucheng, Zhou, Huan, Xie, Lei

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video. Current mainstream lip-reading approaches only use a single visual encoder to model input videos of a single scale. In t

Externí odkaz: http://arxiv.org/abs/2404.05466

Zobrazit plný text záznamu

Report

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Autor: Chen, Peikun, Yu, Fan, Lian, Yuhao, Xue, Hongfei, Wan, Xucheng, Zheng, Naijun, Zhou, Huan, Xie, Lei

Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as simil

Externí odkaz: http://arxiv.org/abs/2310.02629

Zobrazit plný text záznamu

Report

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Autor: Liu, Kai, Du, Ziqing, Wan, Xucheng, Zhou, Huan

Target speech extraction (TSE) systems are designed to extract target speech from a multi-talker mixture. The popular training objective for most prior TSE networks is to enhance reconstruction performance of extracted speech waveform. However, it ha

Externí odkaz: http://arxiv.org/abs/2303.05023

Zobrazit plný text záznamu

Report

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

Autor: Liu, Kai, Wan, Xucheng, Du, Ziqing, Zhou, Huan

As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and lever

Externí odkaz: http://arxiv.org/abs/2301.06277

Zobrazit plný text záznamu

Report

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

Autor: Wan, Xucheng, Liu, Kai, Du, Ziqing, Zhou, Huan

To address the monaural speech enhancement problem, numerous research studies have been conducted to enhance speech via operations either in time-domain on the inner-domain learned from the speech mixture or in time--frequency domain on the fixed ful

Externí odkaz: http://arxiv.org/abs/2209.11905

Zobrazit plný text záznamu

Report

Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Autor: Du, Ziqing, Liu, Kai, Wan, Xucheng, Zhou, Huan

Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion. Despite numerous research efforts and progresses, comparing with speech activity detection (VAD), OSD remains an open challenge and its overa

Externí odkaz: http://arxiv.org/abs/2209.11906

Zobrazit plný text záznamu

Report

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

Autor: Tian, Qiao, Wan, Xucheng, Liu, Shan

Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems. Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural wavefo

Externí odkaz: http://arxiv.org/abs/1812.02339

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání