Zobrazeno 1 - 10
of 2 545
pro vyhledávání: '"Shen, Kai"'
Publikováno v:
IEEE Transactions on Pattern Analysis and Machine Intelligence 2024
The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mappin
Externí odkaz:
http://arxiv.org/abs/2407.05100
In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing v
Externí odkaz:
http://arxiv.org/abs/2406.07119
Autor:
Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as
Externí odkaz:
http://arxiv.org/abs/2404.03204
Autor:
Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, Zhao, Sheng
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre,
Externí odkaz:
http://arxiv.org/abs/2403.03100
Autor:
Park, Seo Jin, Govindan, Ramesh, Shen, Kai, Culler, David, Özcan, Fatma, Kim, Geon-Woo, Levy, Hank
Traditional cluster designs were originally server-centric, and have evolved recently to support hardware acceleration and storage disaggregation. In applications that leverage acceleration, the server CPU performs the role of orchestrating computati
Externí odkaz:
http://arxiv.org/abs/2309.12665
Autor:
Leng, Yichong, Guo, Zhifang, Shen, Kai, Tan, Xu, Ju, Zeqian, Liu, Yanqing, Liu, Yufei, Yang, Dongchao, Zhang, Leying, Song, Kaitao, He, Lei, Li, Xiang-Yang, Zhao, Sheng, Qin, Tao, Bian, Jiang
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using
Externí odkaz:
http://arxiv.org/abs/2309.02285
Autor:
Shen, Kai, Ju, Zeqian, Tan, Xu, Liu, Yanqing, Leng, Yichong, He, Lei, Qin, Tao, Zhao, Sheng, Bian, Jiang
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize s
Externí odkaz:
http://arxiv.org/abs/2304.09116
Autor:
Tran, Charlie, Shen, Kai, Liu, Kang, Ashok, Akshay, Ramirez-Zamora, Adolfo, Chen, Jinghua, Li, Yulin, Fang, Ruogu
Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnosti
Externí odkaz:
http://arxiv.org/abs/2302.06727
The Transformer architecture consists of self-attention and feed-forward networks (FFNs) which can be viewed as key-value memories according to previous works. However, FFN and traditional memory utilize different activation functions (i.e., ReLU and
Externí odkaz:
http://arxiv.org/abs/2302.06461
Text error correction aims to correct the errors in text sequences such as those typed by humans or generated by speech recognition models. Previous error correction methods usually take the source (incorrect) sentence as encoder input and generate t
Externí odkaz:
http://arxiv.org/abs/2211.13252