Výsledky vyhledávání - "Liu, Linquan"

Report

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Autor: Hu, Shujie, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Hao, Hongkun, Pan, Jing, Liu, Xunying, Li, Jinyu, Sivasankaran, Sunit, Liu, Linquan, Wei, Furu

The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilitie

Externí odkaz: http://arxiv.org/abs/2404.00656

Zobrazit plný text záznamu

Report

On decoder-only architecture for speech-to-text and large language model integration

Autor: Wu, Jian, Gaur, Yashesh, Chen, Zhuo, Zhou, Long, Zhu, Yimeng, Wang, Tianrui, Li, Jinyu, Liu, Shujie, Ren, Bo, Liu, Linquan, Wu, Yu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been e

Externí odkaz: http://arxiv.org/abs/2307.03917

Zobrazit plný text záznamu

Report

Code-Switching Text Generation and Injection in Mandarin-English ASR

Autor: Yu, Haibin, Hu, Yuxuan, Qian, Yao, Jin, Ma, Liu, Linquan, Liu, Shujie, Shi, Yu, Qian, Yanmin, Lin, Edward, Zeng, Michael

Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance. Automatic Speech Recognition (ASR) with End-to-End (E2E) modeling for such speech can be a challenging task due to the lack of data. In t

Externí odkaz: http://arxiv.org/abs/2303.10949

Zobrazit plný text záznamu

Report

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

Autor: Xue, Ruiqing, Liu, Yanqing, He, Lei, Tan, Xu, Liu, Linquan, Lin, Edward, Zhao, Sheng

Neural text-to-speech (TTS) generally consists of cascaded architecture with separately optimized acoustic model and vocoder, or end-to-end architecture with continuous mel-spectrograms or self-extracted speech frames as the intermediate representati

Externí odkaz: http://arxiv.org/abs/2303.02939

Zobrazit plný text záznamu

Report

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

Autor: Sun, Eric, Li, Jinyu, Hu, Yuxuan, Zhu, Yimeng, Zhou, Long, Xue, Jian, Wang, Peidong, Liu, Linquan, Liu, Shujie, Lin, Edward, Gong, Yifan

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss

Externí odkaz: http://arxiv.org/abs/2303.00786

Zobrazit plný text záznamu

Report

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

Autor: Kumatani, Kenichi, Gmyr, Robert, Salinas, Felipe Cruz, Liu, Linquan, Zuo, Wei, Patel, Devang, Sun, Eric, Shi, Yu

The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity. In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algor

Externí odkaz: http://arxiv.org/abs/2112.05820

Zobrazit plný text záznamu

Report

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Autor: Leng, Yichong, Tan, Xu, Wang, Rui, Zhu, Linchen, Xu, Jin, Liu, Wenjie, Liu, Linquan, Qin, Tao, Li, Xiang-Yang, Lin, Edward, Liu, Tie-Yan

Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current e

Externí odkaz: http://arxiv.org/abs/2109.14420

Zobrazit plný text záznamu

Report

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Autor: Leng, Yichong, Tan, Xu, Zhu, Linchen, Xu, Jin, Luo, Renqian, Liu, Linquan, Qin, Tao, Li, Xiang-Yang, Lin, Ed, Liu, Tie-Yan

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to c

Externí odkaz: http://arxiv.org/abs/2105.03842

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání