Zobrazeno 11 - 20
of 2 053
pro vyhledávání: '"Li, Jinyu"'
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024
In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios
Externí odkaz:
http://arxiv.org/abs/2403.13423
Autor:
Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, Zhao, Sheng
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre,
Externí odkaz:
http://arxiv.org/abs/2403.03100
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prom
Externí odkaz:
http://arxiv.org/abs/2401.00246
The development of U.S. Army and NATO data link systems is introduced first, and then the development trend of future intelligent data link is summarized into integration, generalization, multifunctionality and high security. A unit-level combat syst
Externí odkaz:
http://arxiv.org/abs/2401.05358
We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech C
Externí odkaz:
http://arxiv.org/abs/2311.02248
The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditio
Externí odkaz:
http://arxiv.org/abs/2310.14806
Publikováno v:
IEEE Transactions on Visualization and Computer Graphics, 2024
It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two pro
Externí odkaz:
http://arxiv.org/abs/2310.15072
Autor:
Li, Jinyu
Due to the powerful edge-preserving ability and low computational complexity, Guided image filter (GIF) and its improved versions has been widely applied in computer vision and image processing. However, all of them are suffered halo artifacts to som
Externí odkaz:
http://arxiv.org/abs/2310.10387
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in t
Externí odkaz:
http://arxiv.org/abs/2310.04399
Autor:
Wang, Yiming, Li, Jinyu
Memory constraint of always-on devices is one of the major concerns when deploying speech processing models on these devices. While larger models trained with sufficiently large amount of data generally perform better, making them fit in the device m
Externí odkaz:
http://arxiv.org/abs/2310.02489