Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Meng, Chutong"'
With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing overall perfo
Externí odkaz:
http://arxiv.org/abs/2309.00169
Autor:
Mei, Xinhao, Meng, Chutong, Liu, Haohe, Kong, Qiuqiang, Ko, Tom, Zhao, Chengqi, Plumbley, Mark D., Zou, Yuexian, Wang, Wenwu
The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years. However, researchers face challenges due to the costly and time-consuming collection process of existing audio-language datasets, which are limited
Externí odkaz:
http://arxiv.org/abs/2303.17395
Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence
Externí odkaz:
http://arxiv.org/abs/2210.04062
This paper introduces GigaST, a large-scale pseudo speech translation (ST) corpus. We create the corpus by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translat
Externí odkaz:
http://arxiv.org/abs/2204.03939
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.