Zobrazeno 1 - 10
of 774
pro vyhledávání: '"Zhang, ZiQiang"'
Autor:
Su, Shuo, Chen, Xiaoshuang, Wang, Yao, Wu, Yulin, Zhang, Ziqiang, Zhan, Kaiqiao, Wang, Ben, Gai, Kun
Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise
Externí odkaz:
http://arxiv.org/abs/2409.13175
Most existing GAN inversion methods either achieve accurate reconstruction but lack editability or offer strong editability at the cost of fidelity. Hence, how to balance the distortioneditability trade-off is a significant challenge for GAN inversio
Externí odkaz:
http://arxiv.org/abs/2312.07079
Publikováno v:
Redai dili, Vol 44, Iss 10, Pp 1900-1914 (2024)
Economic forests are crucial sources of food and nutrition. To guarantee national grain and oil security, it is extremely important to investigate the spatial association between cutting quotas and the economic forest planting. The logging quota sche
Externí odkaz:
https://doaj.org/article/601bb0b849234c23ab7542902f99d28c
Autor:
Wang, Tianrui, Zhou, Long, Zhang, Ziqiang, Wu, Yu, Liu, Shujie, Gaur, Yashesh, Chen, Zhuo, Li, Jinyu, Wei, Furu
Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities. In this paper, we propose VioLA, a single auto-regressive Transformer decoder-only network that u
Externí odkaz:
http://arxiv.org/abs/2305.16107
Autor:
Zhang, Ziqiang, Zhou, Long, Wang, Chengyi, Chen, Sanyuan, Wu, Yu, Liu, Shujie, Chen, Zhuo, Liu, Yanqing, Wang, Huaming, Li, Jinyu, He, Lei, Zhao, Sheng, Wei, Furu
We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target lang
Externí odkaz:
http://arxiv.org/abs/2303.03926
Autor:
Wang, Chengyi, Chen, Sanyuan, Wu, Yu, Zhang, Ziqiang, Zhou, Long, Liu, Shujie, Chen, Zhuo, Liu, Yanqing, Wang, Huaming, Li, Jinyu, He, Lei, Zhao, Sheng, Wei, Furu
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a condit
Externí odkaz:
http://arxiv.org/abs/2301.02111
Autor:
Zhu, Qiushi, Zhou, Long, Zhang, Ziqiang, Liu, Shujie, Jiao, Binxing, Zhang, Jie, Dai, Lirong, Jiang, Daxin, Li, Jinyu, Wei, Furu
Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal in
Externí odkaz:
http://arxiv.org/abs/2211.11275
Autor:
Wei, Kun, Zhou, Long, Zhang, Ziqiang, Chen, Liping, Liu, Shujie, He, Lei, Li, Jinyu, Wei, Furu
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST. However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of
Externí odkaz:
http://arxiv.org/abs/2210.17027
The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representat
Externí odkaz:
http://arxiv.org/abs/2210.03730
Autor:
Zhang, Ziqiang, Chen, Sanyuan, Zhou, Long, Wu, Yu, Ren, Shuo, Liu, Shujie, Yao, Zhuoyuan, Gong, Xun, Dai, Lirong, Li, Jinyu, Wei, Furu
How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) t
Externí odkaz:
http://arxiv.org/abs/2209.15329