Výsledky vyhledávání

Report

A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning

Autor: Gagnere, Antonin, Peeters, Geoffroy, Essid, Slim

Publikováno v: ISMIR 2024, Nov 2024, San Francisco, Californ, United States

In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking. Taking inspiration from the Contrastive Predictive Coding paradigm, we propose to train a Log-Mel-Spectr

Externí odkaz: http://arxiv.org/abs/2411.04152

Zobrazit plný text záznamu

Report

An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment

Autor: Malard, Hugo, Olvera, Michel, Lathuiliere, Stéphane, Essid, Slim

Multimodal large language models have fueled progress in image captioning. These models, fine-tuned on vast image datasets, exhibit a deep understanding of semantic concepts. In this work, we show that this ability can be re-purposed for audio captio

Externí odkaz: http://arxiv.org/abs/2410.05997

Zobrazit plný text záznamu

Report

A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification

Autor: Olvera, Michel, Stamatiadis, Paraskevas, Essid, Slim

Audio-text models trained via contrastive learning offer a practical approach to perform audio classification through natural language prompts, such as "this is a sound of" followed by category names. In this work, we explore alternative prompt templ

Externí odkaz: http://arxiv.org/abs/2409.13676

Zobrazit plný text záznamu

Report

SALT: Standardized Audio event Label Taxonomy

Autor: Stamatiadis, Paraskevas, Olvera, Michel, Essid, Slim

Publikováno v: DCASE, Oct 2024, Tokyo, Japan

Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are co

Externí odkaz: http://arxiv.org/abs/2409.11746

Zobrazit plný text záznamu

Report

Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

Autor: Perera, David, Letzelter, Victor, Mariotte, Théo, Cortés, Adrien, Chen, Mickael, Essid, Slim, Richard, Gaël

We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-t

Externí odkaz: http://arxiv.org/abs/2407.15580

Zobrazit plný text záznamu

Report

Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations

Autor: Zaiem, Salah, Parcollet, Titouan, Essid, Slim

Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation

Externí odkaz: http://arxiv.org/abs/2407.00756

Zobrazit plný text záznamu

Report

Winner-takes-all learners are geometry-aware conditional density estimators

Autor: Letzelter, Victor, Perera, David, Rommel, Cédric, Fontaine, Mathieu, Essid, Slim, Richard, Gael, Pérez, Patrick

Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing

Externí odkaz: http://arxiv.org/abs/2406.04706

Zobrazit plný text záznamu

Report

A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

Autor: Serre, Thomas, Fontaine, Mathieu, Benhaim, Éric, Dutour, Geoffroy, Essid, Slim

Publikováno v: ICASSP, Apr 2024, Seoul (Korea), South Korea

Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research eff

Externí odkaz: http://arxiv.org/abs/2404.08022

Zobrazit plný text záznamu

Report

Online speaker diarization of meetings guided by speech separation

Autor: Gruttadauria, Elio, Fontaine, Mathieu, Essid, Slim

Publikováno v: IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2024, Seoul (Korea), South Korea

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic dat

Externí odkaz: http://arxiv.org/abs/2402.00067

Zobrazit plný text záznamu

Report

On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

Autor: Quelennec, Aurian, Olvera, Michel, Peeters, Geoffroy, Essid, Slim

Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is the subject of many recent publications. However, one aspect of

Externí odkaz: http://arxiv.org/abs/2312.14005

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání