Výsledky vyhledávání

Report

Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

Autor: Perera, David, Letzelter, Victor, Mariotte, Théo, Cortés, Adrien, Chen, Mickael, Essid, Slim, Richard, Gaël

We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-t

Externí odkaz: http://arxiv.org/abs/2407.15580

Zobrazit plný text záznamu

Report

Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations

Autor: Zaiem, Salah, Parcollet, Titouan, Essid, Slim

Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation

Externí odkaz: http://arxiv.org/abs/2407.00756

Zobrazit plný text záznamu

Report

Winner-takes-all learners are geometry-aware conditional density estimators

Autor: Letzelter, Victor, Perera, David, Rommel, Cédric, Fontaine, Mathieu, Essid, Slim, Richard, Gael, Pérez, Patrick

Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing

Externí odkaz: http://arxiv.org/abs/2406.04706

Zobrazit plný text záznamu

Report

A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

Autor: Serre, Thomas, Fontaine, Mathieu, Benhaim, Éric, Dutour, Geoffroy, Essid, Slim

Publikováno v: ICASSP, Apr 2024, Seoul (Korea), South Korea

Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research eff

Externí odkaz: http://arxiv.org/abs/2404.08022

Zobrazit plný text záznamu

Report

Online speaker diarization of meetings guided by speech separation

Autor: Gruttadauria, Elio, Fontaine, Mathieu, Essid, Slim

Publikováno v: IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2024, Seoul (Korea), South Korea

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic dat

Externí odkaz: http://arxiv.org/abs/2402.00067

Zobrazit plný text záznamu

Report

On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

Autor: Quelennec, Aurian, Olvera, Michel, Peeters, Geoffroy, Essid, Slim

Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is the subject of many recent publications. However, one aspect of

Externí odkaz: http://arxiv.org/abs/2312.14005

Zobrazit plný text záznamu

Report

Collaborating Foundation Models for Domain Generalized Semantic Segmentation

Autor: Benigmim, Yasser, Roy, Subhankar, Essid, Slim, Kalogeiton, Vicky, Lathuilière, Stéphane

Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Rand

Externí odkaz: http://arxiv.org/abs/2312.09788

Zobrazit plný text záznamu

Report

Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

Autor: Letzelter, Victor, Fontaine, Mathieu, Chen, Mickaël, Pérez, Patrick, Essid, Slim, Richard, Gaël

Publikováno v: Advances in neural information processing systems, Dec 2023, New Orleans, United States

We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simpl

Externí odkaz: http://arxiv.org/abs/2311.01052

Zobrazit plný text záznamu

Report

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

Autor: Zaiem, Salah, Kemiche, Youcef, Parcollet, Titouan, Essid, Slim, Ravanelli, Mirco

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluat

Externí odkaz: http://arxiv.org/abs/2308.14456

Zobrazit plný text záznamu

Report

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

Autor: Furnon, Nicolas, Serizel, Romain, Essid, Slim, Illina, Irina

Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the micropho

Externí odkaz: http://arxiv.org/abs/2307.16582

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání