Výsledky vyhledávání

Report

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

Autor: Zhang, Ziqiang, Zhou, Long, Ao, Junyi, Liu, Shujie, Dai, Lirong, Li, Jinyu, Wei, Furu

The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representat

Externí odkaz: http://arxiv.org/abs/2210.03730

Zobrazit plný text záznamu

Report

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Autor: Zhang, Ziqiang, Chen, Sanyuan, Zhou, Long, Wu, Yu, Ren, Shuo, Liu, Shujie, Yao, Zhuoyuan, Gong, Xun, Dai, Lirong, Li, Jinyu, Wei, Furu

How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) t

Externí odkaz: http://arxiv.org/abs/2209.15329

Zobrazit plný text záznamu

Report

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Autor: Kanda, Naoyuki, Wu, Jian, Wang, Xiaofei, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya

This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on independently deve

Externí odkaz: http://arxiv.org/abs/2209.04974

Zobrazit plný text záznamu

Report

DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM

Autor: Ye, Weicai, Yu, Xingyuan, Lan, Xinyue, Ming, Yuhang, Li, Jinyu, Bao, Hujun, Cui, Zhaopeng, Zhang, Guofeng

We present a novel dual-flow representation of scene motion that decomposes the optical flow into a static flow field caused by the camera motion and another dynamic flow field caused by the objects' movements in the scene. Based on this representati

Externí odkaz: http://arxiv.org/abs/2207.08794

Zobrazit plný text záznamu

Report

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Autor: Wang, Chengyi, Wang, Yiming, Wu, Yu, Chen, Sanyuan, Li, Jinyu, Liu, Shujie, Wei, Furu

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to interpret. We pro

Externí odkaz: http://arxiv.org/abs/2206.10125

Zobrazit plný text záznamu

Report

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

Autor: Zhang, Ziqiang, Ao, Junyi, Zhou, Long, Liu, Shujie, Wei, Furu, Li, Jinyu

This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese. The YiTrans system is built on large-scale pre-trained enco

Externí odkaz: http://arxiv.org/abs/2206.05777

Zobrazit plný text záznamu

Akademický článek

Technical analysis of China′s energy security situation

Autor: LI Jinyu, LIU Ruijian, ZHOU Chaoyang, LI Nan, TANG Fangcheng

Publikováno v: 电力工程技术, Vol 42, Iss 6, Pp 249-255 (2023)

As an important part of the national security system, energy security is of great importance to the construction of a modern and powerful socialist country in China. Based on the definition of energy security by the International Energy Agency, an ev

Externí odkaz: https://doaj.org/article/f995d428d06040759659cc92f9e8ba0e

Zobrazit plný text záznamu

Report

Ultra Fast Speech Separation Model with Teacher Student Learning

Autor: Chen, Sanyuan, Wu, Yu, Chen, Zhuo, Wu, Jian, Yoshioka, Takuya, Liu, Shujie, Li, Jinyu, Yu, Xiangzhan

Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder layers, which

Externí odkaz: http://arxiv.org/abs/2204.12777

Zobrazit plný text záznamu

Report

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Autor: Chen, Sanyuan, Wu, Yu, Wang, Chengyi, Liu, Shujie, Chen, Zhuo, Wang, Peidong, Liu, Gang, Li, Jinyu, Wu, Jian, Yu, Xiangzhan, Wei, Furu

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised l

Externí odkaz: http://arxiv.org/abs/2204.12765

Zobrazit plný text záznamu

Report

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Autor: Xue, Jian, Wang, Peidong, Li, Jinyu, Post, Matt, Gaur, Yashesh

Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. Compared with ca

Externí odkaz: http://arxiv.org/abs/2204.05352

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání