Zobrazeno 1 - 10
of 153
pro vyhledávání: '"Zeng, Michael"'
Autor:
Le, Chenyang, Qian, Yao, Wang, Dongmei, Zhou, Long, Liu, Shujie, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, Zhao, Sheng, Zeng, Michael
There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeli
Externí odkaz:
http://arxiv.org/abs/2405.17809
Autor:
Zhang, Leying, Qian, Yao, Zhou, Long, Liu, Shujie, Wang, Dongmei, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, He, Lei, Zhao, Sheng, Zeng, Michael
Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a chal
Externí odkaz:
http://arxiv.org/abs/2404.06690
Autor:
Kanda, Naoyuki, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Xia, Yufei, Li, Jinzhu, Liu, Yanqing, Zhao, Sheng, Zeng, Michael
Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their a
Externí odkaz:
http://arxiv.org/abs/2402.07383
Autor:
Xiao, Bin, Wu, Haiping, Xu, Weijian, Dai, Xiyang, Hu, Houdong, Lu, Yumao, Zeng, Michael, Liu, Ce, Yuan, Lu
We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a
Externí odkaz:
http://arxiv.org/abs/2311.06242
Autor:
Zhang, Leying, Qian, Yao, Yu, Linfeng, Wang, Heming, Wang, Xinkai, Yang, Hemin, Zhou, Long, Liu, Shujie, Qian, Yanmin, Zeng, Michael
Target Speech Extraction (TSE) is a crucial task in speech processing that focuses on isolating the clean speech of a specific speaker from complex mixtures. While discriminative methods are commonly used for TSE, they can introduce distortion in ter
Externí odkaz:
http://arxiv.org/abs/2309.13874
Autor:
Ling, Shaoshi, Hu, Yuxuan, Qian, Shuangbei, Ye, Guoli, Qian, Yao, Gong, Yifan, Lin, Ed, Zeng, Michael
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. Howeve
Externí odkaz:
http://arxiv.org/abs/2307.08234
Autor:
Li, Chenda, Qian, Yao, Chen, Zhuo, Kanda, Naoyuki, Wang, Dongmei, Yoshioka, Takuya, Qian, Yanmin, Zeng, Michael
State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages. However, it remains a challenge for these models to recognize overlapped speech, which is
Externí odkaz:
http://arxiv.org/abs/2305.18747
Autor:
Le, Chenyang, Qian, Yao, Zhou, Long, Liu, Shujie, Qian, Yanmin, Zeng, Michael, Huang, Xuedong
Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of pub
Externí odkaz:
http://arxiv.org/abs/2305.14838
Autor:
Fang, Yuwei, Khademi, Mahmoud, Zhu, Chenguang, Yang, Ziyi, Pryzant, Reid, Xu, Yichong, Qian, Yao, Yoshioka, Takuya, Yuan, Lu, Zeng, Michael, Huang, Xuedong
Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combin
Externí odkaz:
http://arxiv.org/abs/2305.13738
Autor:
Xu, Ruochen, Wang, Song, Liu, Yang, Wang, Shuohang, Xu, Yichong, Iter, Dan, Zhu, Chenguang, Zeng, Michael
Query-focused summarization (QFS) aims to extract or generate a summary of an input document that directly answers or is relevant to a given query. The lack of large-scale datasets in the form of documents, queries, and summaries has hindered model d
Externí odkaz:
http://arxiv.org/abs/2305.13086