Zobrazeno 1 - 10
of 73
pro vyhledávání: '"Chang, Heng-Jui"'
Autor:
Chang, Heng-Jui
Despite success across various tasks, self-supervised speech models face significant challenges in enhancing content-related performance with unlabeled data, requiring substantial computational resources. Meanwhile, learning from clustered discrete u
Autor:
Yang, Shu-wen, Chang, Heng-Jui, Huang, Zili, Liu, Andy T., Lai, Cheng-I, Wu, Haibin, Shi, Jiatong, Chang, Xuankai, Tsai, Hsiang-Sheng, Huang, Wen-Chin, Feng, Tzu-hsun, Chi, Po-Han, Lin, Yist Y., Chuang, Yung-Sung, Huang, Tzu-Hsien, Tseng, Wei-Cheng, Lakhotia, Kushal, Li, Shang-Wen, Mohamed, Abdelrahman, Watanabe, Shinji, Lee, Hung-yi
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of N
Externí odkaz:
http://arxiv.org/abs/2404.09385
Autor:
Wang, Hsuan-Fu, Shih, Yi-Jen, Chang, Heng-Jui, Berry, Layne, Peng, Puyuan, Lee, Hung-yi, Wang, Hsin-Min, Harwath, David
The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP.
Externí odkaz:
http://arxiv.org/abs/2402.06959
Autor:
Chang, Heng-Jui, Glass, James
This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves
Externí odkaz:
http://arxiv.org/abs/2311.09117
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of developing these large models, building new encoders for new tasks and deploying them to o
Externí odkaz:
http://arxiv.org/abs/2309.07707
Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging. We propose speaker-invariant clustering (Spin), a novel self-supervised learning method
Externí odkaz:
http://arxiv.org/abs/2305.11072
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement
Externí odkaz:
http://arxiv.org/abs/2305.10005
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin
Externí odkaz:
http://arxiv.org/abs/2211.01180
Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly. Therefore, we propose SpeechCLIP, a novel framework bridging speech and text through images to enhanc
Externí odkaz:
http://arxiv.org/abs/2210.00705
Autor:
Tsai, Hsiang-Sheng, Chang, Heng-Jui, Huang, Wen-Chin, Huang, Zili, Lakhotia, Kushal, Yang, Shu-wen, Dong, Shuyan, Liu, Andy T., Lai, Cheng-I Jeff, Shi, Jiatong, Chang, Xuankai, Hall, Phil, Chen, Hsuan-Jui, Li, Shang-Wen, Watanabe, Shinji, Mohamed, Abdelrahman, Lee, Hung-yi
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the
Externí odkaz:
http://arxiv.org/abs/2203.06849