Zobrazeno 1 - 10
of 18
pro vyhledávání: '"Sivaraman, Aswin"'
In tandem with the recent advancements in foundation model research, there has been a surge of generative music AI applications within the past few years. As the idea of AI-generated or AI-augmented music becomes more mainstream, many researchers in
Externí odkaz:
http://arxiv.org/abs/2409.09378
Autor:
Palaskar, Shruti, Rudovic, Oggi, Dharur, Sameer, Pesce, Florian, Krishna, Gautam, Sivaraman, Aswin, Berkowitz, Jack, Abdelaziz, Ahmed Hussen, Adya, Saurabh, Tewfik, Ahmed
Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimo
Externí odkaz:
http://arxiv.org/abs/2406.09617
The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement
With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their com
Externí odkaz:
http://arxiv.org/abs/2211.07493
The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using
Externí odkaz:
http://arxiv.org/abs/2110.10739
Autor:
Sivaraman, Aswin, Kim, Minje
This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance an
Externí odkaz:
http://arxiv.org/abs/2105.03542
Training personalized speech enhancement models is innately a no-shot learning problem due to privacy constraints and limited access to noise-free speech from the target user. If there is an abundance of unlabeled noisy speech from the test-time user
Externí odkaz:
http://arxiv.org/abs/2104.02018
Autor:
Sivaraman, Aswin, Kim, Minje
This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their enhancement fun
Externí odkaz:
http://arxiv.org/abs/2104.02017
Autor:
Reddy, Sravana, Yu, Yongze, Pappu, Aasish, Sivaraman, Aswin, Rezapour, Rezvaneh, Jones, Rosie
Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect suc
Externí odkaz:
http://arxiv.org/abs/2103.02585
Autor:
Sivaraman, Aswin, Kim, Minje
This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to cleaning rec
Externí odkaz:
http://arxiv.org/abs/2011.03426
Autor:
Sivaraman, Aswin, Kim, Minje
Publikováno v:
Published in Interspeech 2020
In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlapping subproblems and introducing a classifier, we are a
Externí odkaz:
http://arxiv.org/abs/2005.08128