Výsledky vyhledávání

Report

Investigation of Speaker Representation for Target-Speaker Speech Processing

Autor: Ashihara, Takanori, Moriya, Takafumi, Horiguchi, Shota, Peng, Junyi, Ochiai, Tsubasa, Delcroix, Marc, Matsuura, Kohei, Sato, Hiroshi

Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection (p-VAD), are important for extracting information about a desired speaker

Externí odkaz: http://arxiv.org/abs/2410.11243

Zobrazit plný text záznamu

Report

State-of-the-art Embeddings with Video-free Segmentation of the Source VoxCeleb Data

Autor: Barahona, Sara, Mošner, Ladislav, Stafylakis, Themos, Plchot, Oldřich, Peng, Junyi, Burget, Lukáš, Černocký, Jan

In this paper, we refine and validate our method for training speaker embedding extractors using weak annotations. More specifically, we use only the audio stream of the source VoxCeleb videos and the names of the celebrities without knowing the time

Externí odkaz: http://arxiv.org/abs/2410.02364

Zobrazit plný text záznamu

Report

CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification

Autor: Peng, Junyi, Mošner, Ladislav, Zhang, Lin, Plchot, Oldřich, Stafylakis, Themos, Burget, Lukáš, Černocký, Jan

Self-supervised learning (SSL) models for speaker verification (SV) have gained significant attention in recent years. However, existing SSL-based SV systems often struggle to capture local temporal dependencies and generalize across different tasks.

Externí odkaz: http://arxiv.org/abs/2409.15234

Zobrazit plný text záznamu

Report

BUT Systems and Analyses for the ASVspoof 5 Challenge

Autor: Rohdin, Johan, Zhang, Lin, Plchot, Oldřich, Staněk, Vojtěch, Mihola, David, Peng, Junyi, Stafylakis, Themos, Beveraki, Dmitriy, Silnova, Anna, Brukner, Jan, Burget, Lukáš

This paper describes the BUT submitted systems for the ASVspoof 5 challenge, along with analyses. For the conventional deepfake detection task, we use ResNet18 and self-supervised models for the closed and open conditions, respectively. In addition,

Externí odkaz: http://arxiv.org/abs/2408.11152

Zobrazit plný text záznamu

Report

Probing Self-supervised Learning Models with Target Speech Extraction

Autor: Peng, Junyi, Delcroix, Marc, Ochiai, Tsubasa, Plchot, Oldrich, Ashihara, Takanori, Araki, Shoko, Cernocky, Jan

Large-scale pre-trained self-supervised learning (SSL) models have shown remarkable advancements in speech-related tasks. However, the utilization of these models in complex multi-talker scenarios, such as extracting a target speaker in a mixture, is

Externí odkaz: http://arxiv.org/abs/2402.13200

Zobrazit plný text záznamu

Report

Target Speech Extraction with Pre-trained Self-supervised Learning Models

Autor: Peng, Junyi, Delcroix, Marc, Ochiai, Tsubasa, Plchot, Oldrich, Araki, Shoko, Cernocky, Jan

Pre-trained self-supervised learning (SSL) models have achieved remarkable success in various speech tasks. However, their potential in target speech extraction (TSE) has not been fully exploited. TSE aims to extract the speech of a target speaker in

Externí odkaz: http://arxiv.org/abs/2402.13199

Zobrazit plný text záznamu

Report

Improving Speaker Verification with Self-Pretrained Transformer Models

Autor: Peng, Junyi, Plchot, Oldřich, Stafylakis, Themos, Mošner, Ladislav, Burget, Lukáš, Černocký, Jan

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest. Despite their success, it is still challenging to disentangle the benefits of large-scale datasets and Transformer structures from th

Externí odkaz: http://arxiv.org/abs/2305.10517

Zobrazit plný text záznamu

Report

Probing Deep Speaker Embeddings for Speaker-related Tasks

Autor: Zhao, Zifeng, Pan, Ding, Peng, Junyi, Gu, Rongzhi

Deep speaker embeddings have shown promising results in speaker recognition, as well as in other speaker-related tasks. However, some issues are still under explored, for instance, the information encoded in these representations and their influence

Externí odkaz: http://arxiv.org/abs/2212.07068

Zobrazit plný text záznamu

Report

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Autor: Peng, Junyi, Stafylakis, Themos, Gu, Rongzhi, Plchot, Oldřich, Mošner, Ladislav, Burget, Lukáš, Černocký, Jan

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks. However, most fine-tuning approaches update all the parameters of the pre-traine

Externí odkaz: http://arxiv.org/abs/2210.16032

Zobrazit plný text záznamu

Report

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification

Autor: Peng, Junyi, Plchot, Oldrich, Stafylakis, Themos, Mosner, Ladislav, Burget, Lukas, Cernocky, Jan

In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks. However, the fine-tuning strategies for adapting those pre-trained models to speaker verification task have yet

Externí odkaz: http://arxiv.org/abs/2210.01273

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání