Výsledky vyhledávání - "Chen, Junkun"

Report

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Autor: Wang, Peidong, Xue, Jian, Li, Jinyu, Chen, Junkun, Subramanian, Aswin Shanmugam

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s

Externí odkaz: http://arxiv.org/abs/2406.10276

Zobrazit plný text záznamu

Report

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Autor: Papi, Sara, Wang, Peidong, Chen, Junkun, Xue, Jian, Kanda, Naoyuki, Li, Jinyu, Gaur, Yashesh

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditio

Externí odkaz: http://arxiv.org/abs/2310.14806

Zobrazit plný text záznamu

Report

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Autor: Chen, Junkun, Xue, Jian, Wang, Peidong, Pan, Jing, Li, Jinyu

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in t

Externí odkaz: http://arxiv.org/abs/2310.04399

Zobrazit plný text záznamu

Report

DiariST: Streaming Speech Translation with Speaker Diarization

Autor: Yang, Mu, Kanda, Naoyuki, Wang, Xiaofei, Chen, Junkun, Wang, Peidong, Xue, Jian, Li, Jinyu, Yoshioka, Takuya

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we p

Externí odkaz: http://arxiv.org/abs/2309.08007

Zobrazit plný text záznamu

Report

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Autor: Papi, Sara, Wang, Peidong, Chen, Junkun, Xue, Jian, Li, Jinyu, Gaur, Yashesh

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transforme

Externí odkaz: http://arxiv.org/abs/2307.03354

Zobrazit plný text záznamu

Report

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Autor: Fan, Xiaoran, Pang, Chao, Yuan, Tian, Bai, He, Zheng, Renjie, Zhu, Pengfei, Wang, Shuohuan, Chen, Junkun, Chen, Zeyu, Huang, Liang, Sun, Yu, Wu, Hua

Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lin

Externí odkaz: http://arxiv.org/abs/2211.03545

Zobrazit plný text záznamu

Report

Is Self-Supervised Learning More Robust Than Supervised Learning?

Autor: Zhong, Yuanyi, Tang, Haoran, Chen, Junkun, Peng, Jian, Wang, Yu-Xiong

Self-supervised contrastive learning is a powerful tool to learn visual representation without labels. Prior work has primarily focused on evaluating the recognition accuracy of various pre-training algorithms, but has overlooked other behavioral asp

Externí odkaz: http://arxiv.org/abs/2206.05259

Zobrazit plný text záznamu

Report

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Autor: Zhang, Hui, Yuan, Tian, Chen, Junkun, Li, Xintong, Zheng, Renjie, Huang, Yuxin, Chen, Xiaojie, Gong, Enlei, Chen, Zeyu, Hu, Xiaoguang, Yu, Dianhai, Ma, Yanjun, Huang, Liang

PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the

Externí odkaz: http://arxiv.org/abs/2205.12007

Zobrazit plný text záznamu

Report

Data-Driven Adaptive Simultaneous Machine Translation

Autor: Xun, Guangxu, Ma, Mingbo, Bian, Yuchen, Cai, Xingyu, Huang, Jiaji, Zheng, Renjie, Chen, Junkun, Yuan, Jiahong, Church, Kenneth, Huang, Liang

In simultaneous translation (SimulMT), the most widely used strategy is the wait-k policy thanks to its simplicity and effectiveness in balancing translation quality and latency. However, wait-k suffers from two major limitations: (a) it is a fixed p

Externí odkaz: http://arxiv.org/abs/2204.12672

Zobrazit plný text záznamu

Report

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Autor: Bai, He, Zheng, Renjie, Chen, Junkun, Li, Xintong, Ma, Mingbo, Huang, Liang

Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the

Externí odkaz: http://arxiv.org/abs/2203.09690

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání