Zobrazeno 1 - 10
of 494
pro vyhledávání: '"audio embeddings"'
Autor:
Devnani, Bhavika, Seto, Skyler, Aldeneh, Zakaria, Toso, Alessandro, Menyaylenko, Elena, Theobald, Barry-John, Sheaffer, Jonathan, Sarabia, Miguel
Humans can picture a sound scene given an imprecise natural language description. For example, it is easy to imagine an acoustic environment given a phrase like "the lion roar came from right behind me!". For a machine to have the same degree of comp
Externí odkaz:
http://arxiv.org/abs/2409.11369
Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the co
Externí odkaz:
http://arxiv.org/abs/2409.09026
Every artist has a creative process that draws inspiration from previous artists and their works. Today, "inspiration" has been automated by generative music models. The black box nature of these models obscures the identity of the works that influen
Externí odkaz:
http://arxiv.org/abs/2401.14542
Autor:
Verma, Prateek
With the advent of modern AI architectures, a shift has happened towards end-to-end architectures. This pivot has led to neural architectures being trained without domain-specific biases/knowledge, optimized according to the task. We in this paper, l
Externí odkaz:
http://arxiv.org/abs/2309.08751
Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all app
Externí odkaz:
http://arxiv.org/abs/2307.10834
Autor:
Ding, Yiwei, Lerch, Alexander
Music classification has been one of the most popular tasks in the field of music information retrieval. With the development of deep learning models, the last decade has seen impressive improvements in a wide range of classification tasks. However,
Externí odkaz:
http://arxiv.org/abs/2306.17424
Pre-trained models (PTMs) have shown great promise in the speech and audio domain. Embeddings leveraged from these models serve as inputs for learning algorithms with applications in various downstream tasks. One such crucial task is Speech Emotion R
Externí odkaz:
http://arxiv.org/abs/2304.11472
We present an analysis of large-scale pretrained deep learning models used for cross-modal (text-to-audio) retrieval. We use embeddings extracted by these models in a metric learning framework to connect matching pairs of audio and text. Shallow neur
Externí odkaz:
http://arxiv.org/abs/2210.02833
Audio fingerprinting systems must efficiently and robustly identify query snippets in an extensive database. To this end, state-of-the-art systems use deep learning to generate compact audio fingerprints. These systems deploy indexing methods, which
Externí odkaz:
http://arxiv.org/abs/2211.11060
An ideal audio retrieval system efficiently and robustly recognizes a short query snippet from an extensive database. However, the performance of well-known audio fingerprinting systems falls short at high signal distortion levels. This paper present
Externí odkaz:
http://arxiv.org/abs/2210.08624