Výsledky vyhledávání

Report

Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Autor: Gong, Xun, Wu, Yu, Li, Jinyu, Liu, Shujie, Zhao, Rui, Chen, Xie, Qian, Yanmin

Publikováno v: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios

Externí odkaz: http://arxiv.org/abs/2403.13423

Zobrazit plný text záznamu

Report

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Autor: Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, Zhao, Sheng

While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre,

Externí odkaz: http://arxiv.org/abs/2403.03100

Zobrazit plný text záznamu

Report

Boosting Large Language Model for Speech Synthesis: An Empirical Study

Autor: Hao, Hongkun, Zhou, Long, Liu, Shujie, Li, Jinyu, Hu, Shujie, Wang, Rui, Wei, Furu

Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prom

Externí odkaz: http://arxiv.org/abs/2401.00246

Zobrazit plný text záznamu

Report

Future Intelligent Data link and Unit-Level Combat System Based on Global Combat Cloud

Autor: Ma, Xinyan, Li, Wei, Zhong, Jian, Li, Jinyu, Wang, Zheng

The development of U.S. Army and NATO data link systems is introduced first, and then the development trend of future intelligent data link is summarized into integration, generalization, multifunctionality and high security. A unit-level combat syst

Externí odkaz: http://arxiv.org/abs/2401.05358

Zobrazit plný text záznamu

Report

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Autor: Pan, Jing, Wu, Jian, Gaur, Yashesh, Sivasankaran, Sunit, Chen, Zhuo, Liu, Shujie, Li, Jinyu

We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech C

Externí odkaz: http://arxiv.org/abs/2311.02248

Zobrazit plný text záznamu

Report

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Autor: Papi, Sara, Wang, Peidong, Chen, Junkun, Xue, Jian, Kanda, Naoyuki, Li, Jinyu, Gaur, Yashesh

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditio

Externí odkaz: http://arxiv.org/abs/2310.14806

Zobrazit plný text záznamu

Report

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

Autor: Li, Jinyu, Pan, Xiaokun, Huang, Gan, Zhang, Ziyang, Wang, Nan, Bao, Hujun, Zhang, Guofeng

Publikováno v: IEEE Transactions on Visualization and Computer Graphics, 2024

It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two pro

Externí odkaz: http://arxiv.org/abs/2310.15072

Zobrazit plný text záznamu

Report

Enhanced Edge-Perceptual Guided Image Filtering

Autor: Li, Jinyu

Due to the powerful edge-preserving ability and low computational complexity, Guided image filter (GIF) and its improved versions has been widely applied in computer vision and image processing. However, all of them are suffered halo artifacts to som

Externí odkaz: http://arxiv.org/abs/2310.10387

Zobrazit plný text záznamu

Report

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Autor: Chen, Junkun, Xue, Jian, Wang, Peidong, Pan, Jing, Li, Jinyu

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in t

Externí odkaz: http://arxiv.org/abs/2310.04399

Zobrazit plný text záznamu

Report

ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers

Autor: Wang, Yiming, Li, Jinyu

Memory constraint of always-on devices is one of the major concerns when deploying speech processing models on these devices. While larger models trained with sufficiently large amount of data generally perform better, making them fit in the device m

Externí odkaz: http://arxiv.org/abs/2310.02489

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání