Výsledky vyhledávání - "Mohamed AbdElrahman"

Report

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Autor: Peng, Puyuan, Huang, Po-Yao, Li, Shang-Wen, Mohamed, Abdelrahman, Harwath, David

We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transforme

Externí odkaz: http://arxiv.org/abs/2403.16973

Zobrazit plný text záznamu

Report

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

Autor: Alwajih, Fakhraddin, Nagoudi, El Moatez Billah, Bhatia, Gagan, Mohamed, Abdelrahman, Abdul-Mageed, Muhammad

Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, success of

Externí odkaz: http://arxiv.org/abs/2403.01031

Zobrazit plný text záznamu

Report

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

Autor: Lin, Chyi-Jiunn, Lin, Guan-Ting, Chuang, Yung-Sung, Wu, Wei-Lun, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Lee, Lin-shan

Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) probl

Externí odkaz: http://arxiv.org/abs/2401.13463

Zobrazit plný text záznamu

Report

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder

Autor: Mohamed, Abdelrahman, Alwajih, Fakhraddin, Nagoudi, El Moatez Billah, Inciarte, Alcides Alcoba, Abdul-Mageed, Muhammad

Although image captioning has a vast array of applications, it has not reached its full potential in languages other than English. Arabic, for instance, although the native language of more than 400 million people, remains largely underrepresented in

Externí odkaz: http://arxiv.org/abs/2311.08844

Zobrazit plný text záznamu

Report

SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

Autor: Cho, Cheol Jun, Mohamed, Abdelrahman, Li, Shang-Wen, Black, Alan W, Anumanchipalli, Gopala K.

Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and the units beyond phonemes are largely underexplored. Here, we

Externí odkaz: http://arxiv.org/abs/2310.10803

Zobrazit plný text záznamu

Report

Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Autor: Cho, Cheol Jun, Mohamed, Abdelrahman, Black, Alan W, Anumanchipalli, Gopala K.

Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correl

Externí odkaz: http://arxiv.org/abs/2310.10788

Zobrazit plný text záznamu

Report

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Autor: Shi, Jiatong, Chen, William, Berrebbi, Dan, Wang, Hsiu-Hsuan, Huang, Wei-Ping, Hu, En-Pei, Chuang, Ho-Lam, Chang, Xuankai, Tang, Yuxun, Li, Shang-Wen, Mohamed, Abdelrahman, Lee, Hung-yi, Watanabe, Shinji

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises

Externí odkaz: http://arxiv.org/abs/2310.05513

Zobrazit plný text záznamu

Report

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Autor: Hsu, Po-chun, Elkahky, Ali, Hsu, Wei-Ning, Adi, Yossi, Nguyen, Tu Anh, Copet, Jade, Dupoux, Emmanuel, Lee, Hung-yi, Mohamed, Abdelrahman

Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes

Externí odkaz: http://arxiv.org/abs/2309.17020

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání