Zobrazeno 1 - 10
of 17
pro vyhledávání: '"Berrebbi, Dan"'
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Autor:
Shi, Jiatong, Chen, William, Berrebbi, Dan, Wang, Hsiu-Hsuan, Huang, Wei-Ping, Hu, En-Pei, Chuang, Ho-Lam, Chang, Xuankai, Tang, Yuxun, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Watanabe, Shinji
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises
Externí odkaz:
http://arxiv.org/abs/2310.05513
Autor:
Chen, William, Shi, Jiatong, Yan, Brian, Berrebbi, Dan, Zhang, Wangyou, Peng, Yifan, Chang, Xuankai, Maiti, Soumi, Watanabe, Shinji
Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA) methods due to the expenses and complexity required to handle many languages. This further harms the reproducibility of SSL, which is already limited to few r
Externí odkaz:
http://arxiv.org/abs/2309.15317
Autor:
Peng, Yifan, Tian, Jinchuan, Yan, Brian, Berrebbi, Dan, Chang, Xuankai, Li, Xinjian, Shi, Jiatong, Arora, Siddhant, Chen, William, Sharma, Roshan, Zhang, Wangyou, Sudo, Yui, Shakeel, Muhammad, Jung, Jee-weon, Maiti, Soumi, Watanabe, Shinji
Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation b
Externí odkaz:
http://arxiv.org/abs/2309.13876
Autor:
Shi, Jiatong, Berrebbi, Dan, Chen, William, Chung, Ho-Lam, Hu, En-Pei, Huang, Wei Ping, Chang, Xuankai, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Watanabe, Shinji
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation.
Externí odkaz:
http://arxiv.org/abs/2305.10615
Autor:
Yan, Brian, Shi, Jiatong, Tang, Yun, Inaguma, Hirofumi, Peng, Yifan, Dalmia, Siddharth, Polák, Peter, Fernandes, Patrick, Berrebbi, Dan, Hayashi, Tomoki, Zhang, Xiaohui, Ni, Zhaoheng, Hira, Moto, Maiti, Soumi, Pino, Juan, Watanabe, Shinji
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text tran
Externí odkaz:
http://arxiv.org/abs/2304.04596
Self-supervised learning (SSL) models reshaped our approach to speech, language and vision. However their huge size and the opaque relations between their layers and tasks result in slow inference and network overthinking, where predictions made from
Externí odkaz:
http://arxiv.org/abs/2211.08989
Self-training (ST) and self-supervised learning (SSL) methods have demonstrated strong improvements in automatic speech recognition (ASR). In spite of these advances, to the best of our knowledge, there is no analysis of how the composition of the la
Externí odkaz:
http://arxiv.org/abs/2211.00854
Self-training (ST), or pseudo-labeling has sparked significant interest in the automatic speech recognition (ASR) community recently because of its success in harnessing unlabeled data. Unlike prior semi-supervised learning approaches that relied on
Externí odkaz:
http://arxiv.org/abs/2210.08711
Autor:
Berrebbi, Dan, Shi, Jiatong, Yan, Brian, Lopez-Francisco, Osbel, Amith, Jonathan D., Watanabe, Shinji
Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the relatedness between
Externí odkaz:
http://arxiv.org/abs/2204.02470
Autor:
Yan, Brian, Zhang, Chunlei, Yu, Meng, Zhang, Shi-Xiong, Dalmia, Siddharth, Berrebbi, Dan, Weng, Chao, Watanabe, Shinji, Yu, Dong
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and co
Externí odkaz:
http://arxiv.org/abs/2111.15016