Zobrazeno 1 - 10
of 184
pro vyhledávání: '"Huang Zili"'
Autor:
Yang, Shu-wen, Chang, Heng-Jui, Huang, Zili, Liu, Andy T., Lai, Cheng-I, Wu, Haibin, Shi, Jiatong, Chang, Xuankai, Tsai, Hsiang-Sheng, Huang, Wen-Chin, Feng, Tzu-hsun, Chi, Po-Han, Lin, Yist Y., Chuang, Yung-Sung, Huang, Tzu-Hsien, Tseng, Wei-Cheng, Lakhotia, Kushal, Li, Shang-Wen, Mohamed, Abdelrahman, Watanabe, Shinji, Lee, Hung-yi
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of N
Externí odkaz:
http://arxiv.org/abs/2404.09385
The speech field is evolving to solve more challenging scenarios, such as multi-channel recordings with multiple simultaneous talkers. Given the many types of microphone setups out there, we present the UniX-Encoder. It's a universal encoder designed
Externí odkaz:
http://arxiv.org/abs/2310.16367
Autor:
Huang, Zili, Chen, Zhuo, Kanda, Naoyuki, Wu, Jian, Wang, Yiming, Li, Jinyu, Yoshioka, Takuya, Wang, Xiaofei, Wang, Peidong
Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline single-talker applic
Externí odkaz:
http://arxiv.org/abs/2211.05564
Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performa
Externí odkaz:
http://arxiv.org/abs/2211.00482
Autor:
Feng, Tzu-hsun, Dong, Annie, Yeh, Ching-Feng, Yang, Shu-wen, Lin, Tzu-Quan, Shi, Jiatong, Chang, Kai-Wei, Huang, Zili, Wu, Haibin, Chang, Xuankai, Watanabe, Shinji, Mohamed, Abdelrahman, Li, Shang-Wen, Lee, Hung-yi
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to measure the com
Externí odkaz:
http://arxiv.org/abs/2210.08634
Publikováno v:
In Applied Energy 1 November 2024 373
Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation extracts target speech from interfering speakers. Despite a great number of supervised le
Externí odkaz:
http://arxiv.org/abs/2203.07960
Autor:
Tsai, Hsiang-Sheng, Chang, Heng-Jui, Huang, Wen-Chin, Huang, Zili, Lakhotia, Kushal, Yang, Shu-wen, Dong, Shuyan, Liu, Andy T., Lai, Cheng-I Jeff, Shi, Jiatong, Chang, Xuankai, Hall, Phil, Chen, Hsuan-Jui, Li, Shang-Wen, Watanabe, Shinji, Mohamed, Abdelrahman, Lee, Hung-yi
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the
Externí odkaz:
http://arxiv.org/abs/2203.06849
Publikováno v:
In Cities August 2024 151
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech. However, the original model requires a fixed (and known) number of speakers, which limits its application to re
Externí odkaz:
http://arxiv.org/abs/2108.03342