Výsledky vyhledávání - "Schmid, Florian A."

Report

Effective Pre-Training of Audio Transformers for Sound Event Detection

Autor: Schmid, Florian, Morocutti, Tobias, Foscarin, Francesco, Schlüter, Jan, Primus, Paul, Widmer, Gerhard

We propose a pre-training pipeline for audio spectrogram transformers for frame-level sound event detection tasks. On top of common pre-training steps, we add a meticulously designed training routine on AudioSet frame-level annotations. This includes

Externí odkaz: http://arxiv.org/abs/2409.09546

Zobrazit plný text záznamu

Report

Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval

Autor: Primus, Paul, Schmid, Florian, Widmer, Gerhard

Dual-encoder-based audio retrieval systems are commonly optimized with contrastive learning on a set of matching and mismatching audio-caption pairs. This leads to a shared embedding space in which corresponding items from the two modalities end up c

Externí odkaz: http://arxiv.org/abs/2408.11641

Zobrazit plný text záznamu

Report

Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining

Autor: Greif, Jonathan, Schmid, Florian, Primus, Paul, Widmer, Gerhard

Query-by-Vocal Imitation (QBV) is about searching audio files within databases using vocal imitations created by the user's voice. Since most humans can effectively communicate sound concepts through voice, QBV offers the more intuitive and convenien

Externí odkaz: http://arxiv.org/abs/2408.11638

Zobrazit plný text záznamu

Report

Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training

Autor: Schmid, Florian, Primus, Paul, Morocutti, Tobias, Greif, Jonathan, Widmer, Gerhard

This technical report describes the CP-JKU team's submission for Task 4 Sound Event Detection with Heterogeneous Training Datasets and Potentially Missing Labels of the DCASE 24 Challenge. We fine-tune three large Audio Spectrogram Transformers, PaSS

Externí odkaz: http://arxiv.org/abs/2408.00791

Zobrazit plný text záznamu

Report

Multi-Iteration Multi-Stage Fine-Tuning of Transformers for Sound Event Detection with Heterogeneous Datasets

Autor: Schmid, Florian, Primus, Paul, Morocutti, Tobias, Greif, Jonathan, Widmer, Gerhard

A central problem in building effective sound event detection systems is the lack of high-quality, strongly annotated sound event datasets. For this reason, Task 4 of the DCASE 2024 challenge proposes learning from two heterogeneous datasets, includi

Externí odkaz: http://arxiv.org/abs/2407.12997

Zobrazit plný text záznamu

Report

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Autor: Schmid, Florian, Primus, Paul, Heittola, Toni, Mesaros, Annamaria, Martín-Morató, Irene, Koutini, Khaled, Widmer, Gerhard

This article describes the Data-Efficient Low-Complexity Acoustic Scene Classification Task in the DCASE 2024 Challenge and the corresponding baseline system. The task setup is a continuation of previous editions (2022 and 2023), which focused on rec

Externí odkaz: http://arxiv.org/abs/2405.10018

Zobrazit plný text záznamu

Report

Tracing Dirac points of topological surface states by ferromagnetic resonance

Autor: Pietanesi, Laura, Marganska, Magdalena, Mayer, Thomas, Barth, Michael, Chen, Lin, Zou, Ji, Weindl, Adrian, Liebig, Alexander, Díaz-Pardo, Rebeca, Suri, Dhavala, Schmid, Florian, Gießibl, Franz J., Richter, Klaus, Tserkovnyak, Yaroslav, Kronseder, Matthias, Back, Christian H.

Publikováno v: Phys. Rev. B 109, 064424 (2024)

Ferromagnetic resonance is used to reveal features of the buried electronic band structure at interfaces between ferromagnetic metals and topological insulators. By monitoring the evolution of magnetic damping, the application of this method to a hyb

Externí odkaz: http://arxiv.org/abs/2403.03518

Zobrazit plný text záznamu

Report

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

Autor: Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excel

Externí odkaz: http://arxiv.org/abs/2310.15648

Zobrazit plný text záznamu

Report

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

Autor: Morocutti, Tobias, Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

The ability to generalize to a wide range of recording devices is a crucial performance factor for audio classification models. The characteristics of different types of microphones introduce distributional shifts in the digitized audio signals due t

Externí odkaz: http://arxiv.org/abs/2305.07499

Zobrazit plný text záznamu

Report

Low-Complexity Audio Embedding Extractors

Autor: Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

Solving tasks such as speaker recognition, music classification, or semantic audio event tagging with deep learning models typically requires computationally demanding networks. General-purpose audio embeddings (GPAEs) are dense representations of au

Externí odkaz: http://arxiv.org/abs/2303.01879

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání