Výsledky vyhledávání - "Wang, Peidong"

Report

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Autor: Wang, Peidong, Xue, Jian, Li, Jinyu, Chen, Junkun, Subramanian, Aswin Shanmugam

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s

Externí odkaz: http://arxiv.org/abs/2406.10276

Zobrazit plný text záznamu

Report

Variational Optimization for Quantum Problems using Deep Generative Networks

Autor: Zhang, Lingxia, Lin, Xiaodie, Wang, Peidong, Yang, Kaiyan, Zeng, Xiao, Wei, Zhaohui, Wang, Zizhu

Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design

Externí odkaz: http://arxiv.org/abs/2404.18041

Zobrazit plný text záznamu

Report

STICKERCONV: Generating Multimodal Empathetic Responses from Scratch

Autor: Zhang, Yiqun, Kong, Fanheng, Wang, Peidong, Sun, Shuang, Wang, Lingshuai, Feng, Shi, Wang, Daling, Zhang, Yifei, Song, Kaisong

Stickers, while widely recognized for enhancing empathetic communication in online interactions, remain underexplored in current empathetic dialogue research, notably due to the challenge of a lack of comprehensive datasets. In this paper, we introdu

Externí odkaz: http://arxiv.org/abs/2402.01679

Zobrazit plný text záznamu

Report

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Autor: Papi, Sara, Wang, Peidong, Chen, Junkun, Xue, Jian, Kanda, Naoyuki, Li, Jinyu, Gaur, Yashesh

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditio

Externí odkaz: http://arxiv.org/abs/2310.14806

Zobrazit plný text záznamu

Report

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Autor: Chen, Junkun, Xue, Jian, Wang, Peidong, Pan, Jing, Li, Jinyu

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in t

Externí odkaz: http://arxiv.org/abs/2310.04399

Zobrazit plný text záznamu

Report

DiariST: Streaming Speech Translation with Speaker Diarization

Autor: Yang, Mu, Kanda, Naoyuki, Wang, Xiaofei, Chen, Junkun, Wang, Peidong, Xue, Jian, Li, Jinyu, Yoshioka, Takuya

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we p

Externí odkaz: http://arxiv.org/abs/2309.08007

Zobrazit plný text záznamu

Report

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Autor: Papi, Sara, Wang, Peidong, Chen, Junkun, Xue, Jian, Li, Jinyu, Gaur, Yashesh

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transforme

Externí odkaz: http://arxiv.org/abs/2307.03354

Zobrazit plný text záznamu

Report

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

Autor: Sun, Eric, Li, Jinyu, Hu, Yuxuan, Zhu, Yimeng, Zhou, Long, Xue, Jian, Wang, Peidong, Liu, Linquan, Liu, Shujie, Lin, Edward, Gong, Yifan

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss

Externí odkaz: http://arxiv.org/abs/2303.00786

Zobrazit plný text záznamu

Report

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Autor: Huang, Zili, Chen, Zhuo, Kanda, Naoyuki, Wu, Jian, Wang, Yiming, Li, Jinyu, Yoshioka, Takuya, Wang, Xiaofei, Wang, Peidong

Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline single-talker applic

Externí odkaz: http://arxiv.org/abs/2211.05564

Zobrazit plný text záznamu

Report

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Autor: Wang, Peidong, Sun, Eric, Xue, Jian, Wu, Yu, Zhou, Long, Gaur, Yashesh, Liu, Shujie, Li, Jinyu

Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to perform both tasks. In real-world applications, such joint ASR and ST model

Externí odkaz: http://arxiv.org/abs/2211.02809

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání