Zobrazeno 1 - 10
of 36
pro vyhledávání: '"Li, Wubo"'
Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for speech an
Externí odkaz:
http://arxiv.org/abs/2010.13991
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video. The previous state-of-the-art model shows superior performance for this task using Transformer-based architecture. However, there remain some
Externí odkaz:
http://arxiv.org/abs/2010.10839
Autor:
Guo, Tingwei, Wen, Cheng, Jiang, Dongwei, Luo, Ne, Zhang, Ruixiong, Zhao, Shuaijiang, Li, Wubo, Gong, Cheng, Zou, Wei, Han, Kun, Li, Xiangang
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quie
Externí odkaz:
http://arxiv.org/abs/2010.09275
Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a ge
Externí odkaz:
http://arxiv.org/abs/2007.14602
Autor:
Jiang, Dongwei, Li, Wubo, Zhang, Ruixiong, Cao, Miao, Luo, Ne, Han, Yang, Zou, Wei, Li, Xiangang
Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Cod
Externí odkaz:
http://arxiv.org/abs/2005.09862
Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-moda
Externí odkaz:
http://arxiv.org/abs/1911.05186
Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this p
Externí odkaz:
http://arxiv.org/abs/1910.09932
Autor:
Li, Wubo, Li, Wentong, Cheng, Tuo, Wang, Lei, Yao, Lianfei, Yang, Hengxiang, Zhang, Xiaoyu, Zheng, Weitao, Wang, Yinghui, Zhang, Jiaqi
Publikováno v:
In Ceramics International 1 February 2023 49(3):4909-4918
Recent studies in sequence-to-sequence learning demonstrate that RNN encoder-decoder structure can successfully generate Chinese poetry. However, existing methods can only generate poetry with a given first line or user's intent theme. In this paper,
Externí odkaz:
http://arxiv.org/abs/1806.09792
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.