Zobrazeno 1 - 10
of 9 083
pro vyhledávání: '"Essid A"'
Publikováno v:
ISMIR 2024, Nov 2024, San Francisco, Californ, United States
In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking. Taking inspiration from the Contrastive Predictive Coding paradigm, we propose to train a Log-Mel-Spectr
Externí odkaz:
http://arxiv.org/abs/2411.04152
Multimodal large language models have fueled progress in image captioning. These models, fine-tuned on vast image datasets, exhibit a deep understanding of semantic concepts. In this work, we show that this ability can be re-purposed for audio captio
Externí odkaz:
http://arxiv.org/abs/2410.05997
Audio-text models trained via contrastive learning offer a practical approach to perform audio classification through natural language prompts, such as "this is a sound of" followed by category names. In this work, we explore alternative prompt templ
Externí odkaz:
http://arxiv.org/abs/2409.13676
Publikováno v:
DCASE, Oct 2024, Tokyo, Japan
Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are co
Externí odkaz:
http://arxiv.org/abs/2409.11746
Autor:
Perera, David, Letzelter, Victor, Mariotte, Théo, Cortés, Adrien, Chen, Mickael, Essid, Slim, Richard, Gaël
We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-t
Externí odkaz:
http://arxiv.org/abs/2407.15580
Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation
Externí odkaz:
http://arxiv.org/abs/2407.00756
Autor:
Letzelter, Victor, Perera, David, Rommel, Cédric, Fontaine, Mathieu, Essid, Slim, Richard, Gael, Pérez, Patrick
Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing
Externí odkaz:
http://arxiv.org/abs/2406.04706
Publikováno v:
ICASSP, Apr 2024, Seoul (Korea), South Korea
Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research eff
Externí odkaz:
http://arxiv.org/abs/2404.08022
Publikováno v:
IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2024, Seoul (Korea), South Korea
Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic dat
Externí odkaz:
http://arxiv.org/abs/2402.00067
Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is the subject of many recent publications. However, one aspect of
Externí odkaz:
http://arxiv.org/abs/2312.14005