Výsledky vyhledávání

Report

MA-CDMR: An Intelligent Cross-domain Multicast Routing Method based on Multiagent Deep Reinforcement Learning in Multi-domain SDWN

Autor: Ye, Miao, Hu, Hongwen, Wang, Xiaoli, Wang, Yuping, Wang, Yong, Peng, Wen, Zheng, Jihao

The cross-domain multicast routing problem in a software-defined wireless network with multiple controllers is a classic NP-hard optimization problem. As the network size increases, designing and implementing cross-domain multicast routing paths in t

Externí odkaz: http://arxiv.org/abs/2409.05888

Zobrazit plný text záznamu

Report

Language Model Can Listen While Speaking

Autor: Ma, Ziyang, Song, Yakun, Du, Chenpeng, Cong, Jian, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Chen, Xie

Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversati

Externí odkaz: http://arxiv.org/abs/2408.02622

Zobrazit plný text záznamu

Report

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

Autor: Wang, Zhichao, Chen, Yuanzhe, Wang, Xinsheng, Xie, Lei, Wang, Yuping

StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) int

Externí odkaz: http://arxiv.org/abs/2408.02178

Zobrazit plný text záznamu

Report

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-e

Externí odkaz: http://arxiv.org/abs/2407.04675

Zobrazit plný text záznamu

Report

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions

Autor: Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Kang, Xiyuan, Plumbley, Mark D., Wang, Wenwu

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the simpli

Externí odkaz: http://arxiv.org/abs/2407.04416

Zobrazit plný text záznamu

Report

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in sp

Externí odkaz: http://arxiv.org/abs/2406.02430

Zobrazit plný text záznamu

Report

Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare

Autor: Li, Xingyu, Peng, Lu, Wang, Yuping, Zhang, Weihua

This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) for advancing biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, wh

Externí odkaz: http://arxiv.org/abs/2405.06784

Zobrazit plný text záznamu

Report

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Autor: Anastassiou, Philip, Tang, Zhenyu, Peng, Kainan, Jia, Dongya, Li, Jiaxin, Tu, Ming, Wang, Yuping, Wang, Yuxuan, Ma, Mingbo

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been cons

Externí odkaz: http://arxiv.org/abs/2404.06674

Zobrazit plný text záznamu

Report

CMP: Cooperative Motion Prediction with Multi-Agent Communication

Autor: Wang, Zehao, Wang, Yuping, Wu, Zhuoyuan, Ma, Hengbo, Li, Zhaowei, Qiu, Hang, Li, Jiachen

The confluence of the advancement of Autonomous Vehicles (AVs) and the maturity of Vehicle-to-Everything (V2X) communication has enabled the capability of cooperative connected and automated vehicles (CAVs). Building on top of cooperative perception,

Externí odkaz: http://arxiv.org/abs/2403.17916

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání