Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Yuan, Jiahong"'
Autor:
Xun, Guangxu, Ma, Mingbo, Bian, Yuchen, Cai, Xingyu, Huang, Jiaji, Zheng, Renjie, Chen, Junkun, Yuan, Jiahong, Church, Kenneth, Huang, Liang
In simultaneous translation (SimulMT), the most widely used strategy is the wait-k policy thanks to its simplicity and effectiveness in balancing translation quality and latency. However, wait-k suffers from two major limitations: (a) it is a fixed p
Externí odkaz:
http://arxiv.org/abs/2204.12672
Autor:
Pan, Xiongfeng, Yuan, Jiahong
Publikováno v:
In Technological Forecasting & Social Change September 2024 206
We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. Our method achieved a significant improvement over most previously reported results on IEMOCAP, a benchmark emotion dataset. Different types of
Externí odkaz:
http://arxiv.org/abs/2108.01132
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring a
Externí odkaz:
http://arxiv.org/abs/2108.01129
This study reports our efforts to improve automatic recognition of suprasegmentals by fine-tuning wav2vec 2.0 with CTC, a method that has been successful in automatic speech recognition. We demonstrate that the method can improve the state-of-the-art
Externí odkaz:
http://arxiv.org/abs/2108.01122
With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme
Externí odkaz:
http://arxiv.org/abs/2104.14631
Autor:
Zheng, Renjie, Ma, Mingbo, Zheng, Baigong, Liu, Kaibo, Yuan, Jiahong, Church, Kenneth, Huang, Liang
Publikováno v:
Findings of EMNLP 2020
Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay. In addition, it needs to continuously
Externí odkaz:
http://arxiv.org/abs/2010.10048
The differences in written text and conversational speech are substantial; previous parsers trained on treebanked text have given very poor results on spontaneous speech. For spoken language, the mismatch in style also extends to prosodic cues, thoug
Externí odkaz:
http://arxiv.org/abs/2010.04288
Autor:
Kassab, Lara, Kryshchenko, Alona, Lyu, Hanbaek, Molitor, Denali, Needell, Deanna, Rebrova, Elizaveta, Yuan, Jiahong
Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and c
Externí odkaz:
http://arxiv.org/abs/2010.01600
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.