Výsledky vyhledávání - "Abdelaziz, Ahmed Hussen"

Report

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Autor: Aldeneh, Zakaria, Higuchi, Takuya, Jung, Jee-weon, Chen, Li-Wei, Shum, Stephen, Abdelaziz, Ahmed Hussen, Watanabe, Shinji, Likhomanenko, Tatiana, Theobald, Barry-John

Iterative self-training, or iterative pseudo-labeling (IPL)--using an improved model from the current iteration to provide pseudo-labels for the next iteration--has proven to be a powerful approach to enhance the quality of speaker representations. R

Externí odkaz: http://arxiv.org/abs/2409.10791

Zobrazit plný text záznamu

Report

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Autor: Chen, Li-Wei, Higuchi, Takuya, Bai, He, Abdelaziz, Ahmed Hussen, Rudnicky, Alexander, Watanabe, Shinji, Likhomanenko, Tatiana, Theobald, Barry-John, Aldeneh, Zakaria

Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech for various downstream tasks. These models use a masked prediction objective, where the model learns to predict information about masked i

Externí odkaz: http://arxiv.org/abs/2409.10788

Zobrazit plný text záznamu

Report

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Autor: Palaskar, Shruti, Rudovic, Oggi, Dharur, Sameer, Pesce, Florian, Krishna, Gautam, Sivaraman, Aswin, Berkowitz, Jack, Abdelaziz, Ahmed Hussen, Adya, Saurabh, Tewfik, Ahmed

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimo

Externí odkaz: http://arxiv.org/abs/2406.09617

Zobrazit plný text záznamu

Report

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Autor: Kumar, Satyam, Buddi, Sai Srujana, Sarawgi, Utkarsh Oggy, Garg, Vineet, Ranjan, Shivesh, Ognjen, Rudovic, Abdelaziz, Ahmed Hussen, Adya, Saurabh

Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need

Externí odkaz: http://arxiv.org/abs/2406.09443

Zobrazit plný text záznamu

Report

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Autor: Aldeneh, Zakaria, Higuchi, Takuya, Jung, Jee-weon, Seto, Skyler, Likhomanenko, Tatiana, Shum, Stephen, Abdelaziz, Ahmed Hussen, Watanabe, Shinji, Theobald, Barry-John

Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-bank features as inputs, and thus, training them on top of self-supervised fe

Externí odkaz: http://arxiv.org/abs/2402.00340

Zobrazit plný text záznamu

Report

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Autor: Jung, Jee-weon, Zhang, Wangyou, Shi, Jiatong, Aldeneh, Zakaria, Higuchi, Takuya, Theobald, Barry-John, Abdelaziz, Ahmed Hussen, Watanabe, Shinji

This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We pr

Externí odkaz: http://arxiv.org/abs/2401.17230

Zobrazit plný text záznamu

Report

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Autor: Krishna, Gautam, Dharur, Sameer, Rudovic, Oggi, Dighe, Pranay, Adya, Saurabh, Abdelaziz, Ahmed Hussen, Tewfik, Ahmed H

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text

Externí odkaz: http://arxiv.org/abs/2310.15261

Zobrazit plný text záznamu

Report

Audiovisual Speech Synthesis using Tacotron2

Autor: Abdelaziz, Ahmed Hussen, Kumar, Anushree Prasanna, Seivwright, Chloe, Fanelli, Gabriele, Binder, Justin, Stylianou, Yannis, Kajarekar, Sachin

Audiovisual speech synthesis is the problem of synthesizing a talking face while maximizing the coherency of the acoustic and visual speech. In this paper, we propose and compare two audiovisual speech synthesis systems for 3D face models. The first

Externí odkaz: http://arxiv.org/abs/2008.00620

Zobrazit plný text záznamu

Report

Modality Dropout for Improved Performance-driven Talking Faces

Autor: Abdelaziz, Ahmed Hussen, Theobald, Barry-John, Dixon, Paul, Knothe, Reinhard, Apostoloff, Nicholas, Kajareker, Sachin

We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are genera

Externí odkaz: http://arxiv.org/abs/2005.13616

Zobrazit plný text záznamu

Report

On the Role of Visual Cues in Audiovisual Speech Enhancement

Autor: Aldeneh, Zakaria, Kumar, Anushree Prasanna, Theobald, Barry-John, Marchi, Erik, Kajarekar, Sachin, Naik, Devang, Abdelaziz, Ahmed Hussen

We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual

Externí odkaz: http://arxiv.org/abs/2004.12031

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání