Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Lin, Yist Y."'
Autor:
Yang, Shu-wen, Chang, Heng-Jui, Huang, Zili, Liu, Andy T., Lai, Cheng-I, Wu, Haibin, Shi, Jiatong, Chang, Xuankai, Tsai, Hsiang-Sheng, Huang, Wen-Chin, Feng, Tzu-hsun, Chi, Po-Han, Lin, Yist Y., Chuang, Yung-Sung, Huang, Tzu-Hsien, Tseng, Wei-Cheng, Lakhotia, Kushal, Li, Shang-Wen, Mohamed, Abdelrahman, Watanabe, Shinji, Lee, Hung-yi
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of N
Externí odkaz:
http://arxiv.org/abs/2404.09385
End-to-end (E2E) systems have shown comparable performance to hybrid systems for automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in many applications, especially for subtitling and computer-aided pronunciation
Externí odkaz:
http://arxiv.org/abs/2306.07949
Autor:
Lin, Yist Y., Han, Tao, Xu, Haihua, Pham, Van Tung, Khassanov, Yerbolat, Chong, Tze Yuang, He, Yi, Lu, Lu, Ma, Zejun
One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based d
Externí odkaz:
http://arxiv.org/abs/2210.15876
Autor:
Yang, Shu-wen, Chi, Po-Han, Chuang, Yung-Sung, Lai, Cheng-I Jeff, Lakhotia, Kushal, Lin, Yist Y., Liu, Andy T., Shi, Jiatong, Chang, Xuankai, Lin, Guan-Ting, Huang, Tzu-Hsien, Tseng, Wei-Cheng, Lee, Ko-tik, Liu, Da-Rong, Huang, Zili, Dong, Shuyan, Li, Shang-Wen, Watanabe, Shinji, Mohamed, Abdelrahman, Lee, Hung-yi
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for va
Externí odkaz:
http://arxiv.org/abs/2105.01051
Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests,
Externí odkaz:
http://arxiv.org/abs/2104.03017
Any-to-any voice conversion (VC) aims to convert the timbre of utterances from and to any speakers seen or unseen during training. Various any-to-any VC approaches have been proposed like AUTOVC, AdaINVC, and FragmentVC. AUTOVC, and AdaINVC utilize s
Externí odkaz:
http://arxiv.org/abs/2104.02901
Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios. In this paper w
Externí odkaz:
http://arxiv.org/abs/2010.14150
Substantial improvements have been achieved in recent years in voice conversion, which converts the speaker characteristics of an utterance into those of another speaker without changing the linguistic content of the utterance. Nonetheless, the impro
Externí odkaz:
http://arxiv.org/abs/2005.08781