Zobrazeno 1 - 10
of 1 213
pro vyhledávání: '"Mohamed AbdElrahman"'
Autor:
Yang, Shu-wen, Chang, Heng-Jui, Huang, Zili, Liu, Andy T., Lai, Cheng-I, Wu, Haibin, Shi, Jiatong, Chang, Xuankai, Tsai, Hsiang-Sheng, Huang, Wen-Chin, Feng, Tzu-hsun, Chi, Po-Han, Lin, Yist Y., Chuang, Yung-Sung, Huang, Tzu-Hsien, Tseng, Wei-Cheng, Lakhotia, Kushal, Li, Shang-Wen, Mohamed, Abdelrahman, Watanabe, Shinji, Lee, Hung-yi
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of N
Externí odkaz:
http://arxiv.org/abs/2404.09385
We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transforme
Externí odkaz:
http://arxiv.org/abs/2403.16973
Autor:
Alwajih, Fakhraddin, Nagoudi, El Moatez Billah, Bhatia, Gagan, Mohamed, Abdelrahman, Abdul-Mageed, Muhammad
Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, success of
Externí odkaz:
http://arxiv.org/abs/2403.01031
Autor:
Lin, Chyi-Jiunn, Lin, Guan-Ting, Chuang, Yung-Sung, Wu, Wei-Lun, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Lee, Lin-shan
Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) probl
Externí odkaz:
http://arxiv.org/abs/2401.13463
Autor:
Mohamed, Abdelrahman, Alwajih, Fakhraddin, Nagoudi, El Moatez Billah, Inciarte, Alcides Alcoba, Abdul-Mageed, Muhammad
Although image captioning has a vast array of applications, it has not reached its full potential in languages other than English. Arabic, for instance, although the native language of more than 400 million people, remains largely underrepresented in
Externí odkaz:
http://arxiv.org/abs/2311.08844
Autor:
Cho, Cheol Jun, Mohamed, Abdelrahman, Li, Shang-Wen, Black, Alan W, Anumanchipalli, Gopala K.
Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and the units beyond phonemes are largely underexplored. Here, we
Externí odkaz:
http://arxiv.org/abs/2310.10803
Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correl
Externí odkaz:
http://arxiv.org/abs/2310.10788
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Autor:
Shi, Jiatong, Chen, William, Berrebbi, Dan, Wang, Hsiu-Hsuan, Huang, Wei-Ping, Hu, En-Pei, Chuang, Ho-Lam, Chang, Xuankai, Tang, Yuxun, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Watanabe, Shinji
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises
Externí odkaz:
http://arxiv.org/abs/2310.05513
Autor:
Hsu, Po-chun, Elkahky, Ali, Hsu, Wei-Ning, Adi, Yossi, Nguyen, Tu Anh, Copet, Jade, Dupoux, Emmanuel, Lee, Hung-yi, Mohamed, Abdelrahman
Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes
Externí odkaz:
http://arxiv.org/abs/2309.17020
Autor:
Tseng, Yuan, Berry, Layne, Chen, Yi-Ting, Chiu, I-Hsiang, Lin, Hsuan-Hao, Liu, Max, Peng, Puyuan, Shih, Yi-Jen, Wang, Hung-Yu, Wu, Haibin, Huang, Po-Yao, Lai, Chun-Mao, Li, Shang-Wen, Harwath, David, Tsao, Yu, Watanabe, Shinji, Mohamed, Abdelrahman, Feng, Chi-Luen, Lee, Hung-yi
Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of l
Externí odkaz:
http://arxiv.org/abs/2309.10787