Výsledky vyhledávání

Report

Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning

Autor: Jiang, Dongwei, Li, Wubo, Cao, Miao, Zou, Wei, Li, Xiangang

Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for speech an

Externí odkaz: http://arxiv.org/abs/2010.13991

Zobrazit plný text záznamu

Report

TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog

Autor: Li, Wubo, Jiang, Dongwei, Zou, Wei, Li, Xiangang

Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video. The previous state-of-the-art model shows superior performance for this task using Transformer-based architecture. However, there remain some

Externí odkaz: http://arxiv.org/abs/2010.10839

Zobrazit plný text záznamu

Report

DiDiSpeech: A Large Scale Mandarin Speech Corpus

Autor: Guo, Tingwei, Wen, Cheng, Jiang, Dongwei, Luo, Ne, Zhang, Ruixiong, Zhao, Shuaijiang, Li, Wubo, Gong, Cheng, Zou, Wei, Han, Kun, Li, Xiangang

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quie

Externí odkaz: http://arxiv.org/abs/2010.09275

Zobrazit plný text záznamu

Report

Transformer based unsupervised pre-training for acoustic representation learning

Autor: Zhang, Ruixiong, Wu, Haiwei, Li, Wubo, Jiang, Dongwei, Zou, Wei, Li, Xiangang

Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a ge

Externí odkaz: http://arxiv.org/abs/2007.14602

Zobrazit plný text záznamu

Report

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

Autor: Jiang, Dongwei, Li, Wubo, Zhang, Ruixiong, Cao, Miao, Luo, Ne, Han, Yang, Zou, Wei, Li, Xiangang

Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Cod

Externí odkaz: http://arxiv.org/abs/2005.09862

Zobrazit plný text záznamu

Report

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

Autor: Li, Wubo, Zou, Wei, Li, Xiangang

Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-moda

Externí odkaz: http://arxiv.org/abs/1911.05186

Zobrazit plný text záznamu

Report

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

Autor: Jiang, Dongwei, Lei, Xiaoning, Li, Wubo, Luo, Ne, Hu, Yuxuan, Zou, Wei, Li, Xiangang

Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this p

Externí odkaz: http://arxiv.org/abs/1910.09932

Zobrazit plný text záznamu

Akademický článek

Visible-light irradiation improved resistive switching characteristics of a 2D Cs2Pb(SCN)2I2-Based memristor device

Autor: Li, Wubo, Li, Wentong, Cheng, Tuo, Wang, Lei, Yao, Lianfei, Yang, Hengxiang, Zhang, Xiaoyu, Zheng, Weitao, Wang, Yinghui, Zhang, Jiaqi

Publikováno v: In Ceramics International 1 February 2023 49(3):4909-4918

Zobrazit plný text záznamu

Report

A Multi-Modal Chinese Poetry Generation Model

Autor: Liu, Dayiheng, Guo, Quan, Li, Wubo, Lv, Jiancheng

Recent studies in sequence-to-sequence learning demonstrate that RNN encoder-decoder structure can successfully generate Chinese poetry. However, existing methods can only generate poetry with a given first line or user's intent theme. In this paper,

Externí odkaz: http://arxiv.org/abs/1806.09792

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání