Výsledky vyhledávání

Report

Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data

Autor: Xu, Jing, Tan, Daxin, Wang, Jiaqi, Chen, Xiao

While large language models (LLMs) have been explored in the speech domain for both generation and recognition tasks, their applications are predominantly confined to the monolingual scenario, with limited exploration in multilingual and code-switche

Externí odkaz: http://arxiv.org/abs/2409.10969

Zobrazit plný text záznamu

Report

Exploring SSL Discrete Tokens for Multilingual ASR

Autor: Cui, Mingyu, Tan, Daxin, Yang, Yifan, Wang, Dingdong, Wang, Huimeng, Chen, Xiao, Chen, Xie, Liu, Xunying

With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques. However,

Externí odkaz: http://arxiv.org/abs/2409.08805

Zobrazit plný text záznamu

Report

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

Autor: Tao, Dehua, Tan, Daxin, Yeung, Yu Ting, Chen, Xiao, Lee, Tan

Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary exper

Externí odkaz: http://arxiv.org/abs/2406.08989

Zobrazit plný text záznamu

Report

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

Autor: Tan, Daxin, Kargas, Nikos, McHardy, David, Papayiannis, Constantinos, Bonafonte, Antonio, Strelec, Marek, Rohnke, Jonas, Filandras, Agis Oikonomou, Wood, Trevor

Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utili

Externí odkaz: http://arxiv.org/abs/2212.03398

Zobrazit plný text záznamu

Report

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

Autor: Tan, Daxin, Deng, Liqun, Zheng, Nianzu, Yeung, Yu Ting, Jiang, Xin, Chen, Xiao, Lee, Tan

This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. T

Externí odkaz: http://arxiv.org/abs/2204.05460

Zobrazit plný text záznamu

Report

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Autor: Zhang, Guangyan, Song, Kaitao, Tan, Xu, Tan, Daxin, Yan, Yuzi, Liu, Yanqing, Wang, Gang, Zhou, Wei, Qin, Tao, Lee, Tan, Zhao, Sheng

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent w

Externí odkaz: http://arxiv.org/abs/2203.17190

Zobrazit plný text záznamu

Report

Environment Aware Text-to-Speech Synthesis

Autor: Tan, Daxin, Zhang, Guangyan, Lee, Tan

This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in T

Externí odkaz: http://arxiv.org/abs/2110.03887

Zobrazit plný text záznamu

Report

A study on the efficacy of model pre-training in developing neural text-to-speech system

Autor: Zhang, Guangyan, Leng, Yichong, Tan, Daxin, Qin, Ying, Song, Kaitao, Tan, Xu, Zhao, Sheng, Lee, Tan

In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of m

Externí odkaz: http://arxiv.org/abs/2110.03857

Zobrazit plný text záznamu

Report

Applying the Information Bottleneck Principle to Prosodic Representation Learning

Autor: Zhang, Guangyan, Qin, Ying, Tan, Daxin, Lee, Tan

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE

Externí odkaz: http://arxiv.org/abs/2108.02821

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání