Zobrazeno 1 - 10
of 118
pro vyhledávání: '"Bello, Juan Pablo"'
Autor:
Wilkins, Julia, Fuentes, Magdalena, Bondi, Luca, Ghaffarzadegan, Shabnam, Abavisani, Ali, Bello, Juan Pablo
Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficia
Externí odkaz:
http://arxiv.org/abs/2309.13343
Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source
Externí odkaz:
http://arxiv.org/abs/2309.09288
Finding the right sound effects (SFX) to match moments in a video is a difficult and time-consuming task, and relies heavily on the quality and completeness of text metadata. Retrieving high-quality (HQ) SFX using a video frame directly as the query
Externí odkaz:
http://arxiv.org/abs/2308.09089
Multi-modal contrastive learning techniques in the audio-text domain have quickly become a highly active area of research. Most works are evaluated with standard audio retrieval and classification benchmarks assuming that (i) these models are capable
Externí odkaz:
http://arxiv.org/abs/2303.10667
Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos. While it proves to be effective for widely used b
Externí odkaz:
http://arxiv.org/abs/2211.08367
Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separat
Externí odkaz:
http://arxiv.org/abs/2205.01273
Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and urban traffi
Externí odkaz:
http://arxiv.org/abs/2204.05156
Autor:
Srivastava, Sangeeta, Wu, Ho-Hsiang, Rulff, Joao, Fuentes, Magdalena, Cartwright, Mark, Silva, Claudio, Arora, Anish, Bello, Juan Pablo
Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine emb
Externí odkaz:
http://arxiv.org/abs/2203.10425
Autor:
Yun, Jihoon, Srivastava, Sangeeta, Roy, Dhrubojyoti, Stohs, Nathan, Mydlarz, Charlie, Salman, Mahin, Steers, Bea, Bello, Juan Pablo, Arora, Anish
The Sounds of New York City (SONYC) wireless sensor network (WSN) has been fielded in Manhattan and Brooklyn over the past five years, as part of a larger human-in-the-loop cyber-physical control system for monitoring, analyzing, and mitigating urban
Externí odkaz:
http://arxiv.org/abs/2203.06220
We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation,
Externí odkaz:
http://arxiv.org/abs/2110.11499