Výsledky vyhledávání

Report

On the Effectiveness of Acoustic BPE in Decoder-Only TTS

Autor: Li, Bohan, Shen, Feiyu, Guo, Yiwei, Wang, Shuai, Chen, Xie, Yu, Kai

Discretizing speech into tokens and generating them by a decoder-only model have been a promising direction for text-to-speech (TTS) and spoken language modeling (SLM). To shorten the sequence length of speech tokens, acoustic byte-pair encoding (BPE

Externí odkaz: http://arxiv.org/abs/2407.03892

Zobrazit plný text záznamu

Report

Topological defects of 2+1D systems from line excitations in 3+1D bulk

Autor: Ji, Wenjie, Chen, Xie

The bulk-boundary correspondence of topological phases suggests strong connections between the topological features in a d+1-dimensional bulk and the potentially gapless theory on the (d-1)+1-dimensional boundary. In 2+1D topological phases, a direct

Externí odkaz: http://arxiv.org/abs/2407.02488

Zobrazit plný text záznamu

Report

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

Autor: Song, Yakun, Chen, Zhuo, Wang, Xiaofei, Ma, Ziyang, Yang, Guanrou, Chen, Xie

Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit ali

Externí odkaz: http://arxiv.org/abs/2406.15752

Zobrazit plný text záznamu

Report

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Autor: Jiang, Anbai, Han, Bing, Lv, Zhiqiang, Deng, Yufeng, Zhang, Wei-Qiang, Chen, Xie, Qian, Yanmin, Liu, Jia, Fan, Pingyi

Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machi

Externí odkaz: http://arxiv.org/abs/2406.11364

Zobrazit plný text záznamu

Report

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

Autor: Yang, Yifan, Song, Zheshu, Zhuo, Jianheng, Cui, Mingyu, Li, Jinpeng, Yang, Bo, Du, Yexing, Ma, Ziyang, Liu, Xunying, Wang, Ziyuan, Li, Ke, Fan, Shuai, Yu, Kai, Zhang, Wei-Qiang, Chen, Guoguo, Chen, Xie

The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpe

Externí odkaz: http://arxiv.org/abs/2406.11546

Zobrazit plný text záznamu

Report

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Autor: Chang, Xuankai, Shi, Jiatong, Tian, Jinchuan, Wu, Yuning, Tang, Yuxun, Wu, Yihan, Watanabe, Shinji, Adi, Yossi, Chen, Xie, Jin, Qin

Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compr

Externí odkaz: http://arxiv.org/abs/2406.07725

Zobrazit plný text záznamu

Report

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Autor: Ma, Ziyang, Chen, Mingjie, Zhang, Hezhao, Zheng, Zhisheng, Chen, Wenxi, Li, Xiquan, Ye, Jiaxin, Chen, Xie, Hain, Thomas

Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are

Externí odkaz: http://arxiv.org/abs/2406.07162

Zobrazit plný text záznamu

Report

MaLa-ASR: Multimedia-Assisted LLM-Based ASR

Autor: Yang, Guanrou, Ma, Ziyang, Yu, Fan, Gao, Zhifu, Zhang, Shiliang, Chen, Xie

As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh per

Externí odkaz: http://arxiv.org/abs/2406.05839

Zobrazit plný text záznamu

Report

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

Autor: Song, Zheshu, Zhuo, Jianheng, Yang, Yifan, Ma, Ziyang, Zhang, Shixiong, Chen, Xie

Recent years have witnessed significant progress in multilingual automatic speech recognition (ASR), driven by the emergence of end-to-end (E2E) models and the scaling of multilingual datasets. Despite that, two main challenges persist in multilingua

Externí odkaz: http://arxiv.org/abs/2406.06619

Zobrazit plný text záznamu

Report

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Autor: Chen, Mingjie, Zhang, Hezhao, Li, Yuanchao, Luo, Jiachen, Wu, Wen, Ma, Ziyang, Bell, Peter, Lai, Catherine, Reiss, Joshua, Wang, Lin, Woodland, Philip C., Chen, Xie, Phan, Huy, Hain, Thomas

Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to s

Externí odkaz: http://arxiv.org/abs/2405.20064

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání