Výsledky vyhledávání

Report

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Autor: Chen, Sanyuan, Liu, Shujie, Zhou, Long, Liu, Yanqing, Tan, Xu, Li, Jinyu, Zhao, Sheng, Qian, Yao, Wei, Furu

This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration

Externí odkaz: http://arxiv.org/abs/2406.05370

Zobrazit plný text záznamu

Report

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Autor: Le, Chenyang, Qian, Yao, Wang, Dongmei, Zhou, Long, Liu, Shujie, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, Zhao, Sheng, Zeng, Michael

There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeli

Externí odkaz: http://arxiv.org/abs/2405.17809

Zobrazit plný text záznamu

Report

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Autor: Zhang, Leying, Qian, Yao, Zhou, Long, Liu, Shujie, Wang, Dongmei, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, He, Lei, Zhao, Sheng, Zeng, Michael

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a chal

Externí odkaz: http://arxiv.org/abs/2404.06690

Zobrazit plný text záznamu

Report

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Autor: Zhang, Leying, Qian, Yao, Yu, Linfeng, Wang, Heming, Wang, Xinkai, Yang, Hemin, Zhou, Long, Liu, Shujie, Qian, Yanmin, Zeng, Michael

Target Speech Extraction (TSE) is a crucial task in speech processing that focuses on isolating the clean speech of a specific speaker from complex mixtures. While discriminative methods are commonly used for TSE, they can introduce distortion in ter

Externí odkaz: http://arxiv.org/abs/2309.13874

Zobrazit plný text záznamu

Report

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Autor: Ling, Shaoshi, Hu, Yuxuan, Qian, Shuangbei, Ye, Guoli, Qian, Yao, Gong, Yifan, Lin, Ed, Zeng, Michael

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. Howeve

Externí odkaz: http://arxiv.org/abs/2307.08234

Zobrazit plný text záznamu

Report

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Autor: Li, Chenda, Qian, Yao, Chen, Zhuo, Kanda, Naoyuki, Wang, Dongmei, Yoshioka, Takuya, Qian, Yanmin, Zeng, Michael

State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages. However, it remains a challenge for these models to recognize overlapped speech, which is

Externí odkaz: http://arxiv.org/abs/2305.18747

Zobrazit plný text záznamu

Report

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Autor: Le, Chenyang, Qian, Yao, Zhou, Long, Liu, Shujie, Qian, Yanmin, Zeng, Michael, Huang, Xuedong

Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of pub

Externí odkaz: http://arxiv.org/abs/2305.14838

Zobrazit plný text záznamu

Report

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Autor: Fang, Yuwei, Khademi, Mahmoud, Zhu, Chenguang, Yang, Ziyi, Pryzant, Reid, Xu, Yichong, Qian, Yao, Yoshioka, Takuya, Yuan, Lu, Zeng, Michael, Huang, Xuedong

Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combin

Externí odkaz: http://arxiv.org/abs/2305.13738

Zobrazit plný text záznamu

Report

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Autor: Yang, Ziyi, Khademi, Mahmoud, Xu, Yichong, Pryzant, Reid, Fang, Yuwei, Zhu, Chenguang, Chen, Dongdong, Qian, Yao, Gao, Mei, Chen, Yi-Ling, Gmyr, Robert, Kanda, Naoyuki, Codella, Noel, Xiao, Bin, Shi, Yu, Yuan, Lu, Yoshioka, Takuya, Zeng, Michael, Huang, Xuedong

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing thi

Externí odkaz: http://arxiv.org/abs/2305.12311

Zobrazit plný text záznamu

Akademický článek

Prevalence and risk factors for isolated systolic hypertension among the oldest‐old population in southwestern China: A community‐based cross‐sectional study

Autor: Xiaobo Huang, Lingli Qiu, Tzung‐Dau Wang, Qian Yao, Jianxiong Liu, Ronghua Xu, Qingkun Zheng, Xingping Zhang, Jinhui Wu

Publikováno v: The Journal of Clinical Hypertension, Vol 26, Iss 7, Pp 757-764 (2024)

Abstract The prevalence of isolated systolic hypertension (ISH) has doubled between 2002−2005 and 2014 among the oldest‐old population in China. However, the prevalence and characteristics of ISH among the oldest‐old population in southwestern

Externí odkaz: https://doaj.org/article/13842b7ce4bd44cf9706454df665f555

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání