Výsledky vyhledávání - "Sivaraman, Aswin"

Report

Prevailing Research Areas for Music AI in the Era of Foundation Models

Autor: Wei, Megan, Modrzejewski, Mateusz, Sivaraman, Aswin, Herremans, Dorien

In tandem with the recent advancements in foundation model research, there has been a surge of generative music AI applications within the past few years. As the idea of AI-generated or AI-augmented music becomes more mainstream, many researchers in

Externí odkaz: http://arxiv.org/abs/2409.09378

Zobrazit plný text záznamu

Report

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Autor: Palaskar, Shruti, Rudovic, Oggi, Dharur, Sameer, Pesce, Florian, Krishna, Gautam, Sivaraman, Aswin, Berkowitz, Jack, Abdelaziz, Ahmed Hussen, Adya, Saurabh, Tewfik, Ahmed

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimo

Externí odkaz: http://arxiv.org/abs/2406.09617

Zobrazit plný text záznamu

Report

The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement

Autor: Kuznetsova, Anastasia, Sivaraman, Aswin, Kim, Minje

With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their com

Externí odkaz: http://arxiv.org/abs/2211.07493

Zobrazit plný text záznamu

Report

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

Autor: Sivaraman, Aswin, Wisdom, Scott, Erdogan, Hakan, Hershey, John R.

The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using

Externí odkaz: http://arxiv.org/abs/2110.10739

Zobrazit plný text záznamu

Report

Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection

Autor: Sivaraman, Aswin, Kim, Minje

This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance an

Externí odkaz: http://arxiv.org/abs/2105.03542

Zobrazit plný text záznamu

Report

Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification

Autor: Sivaraman, Aswin, Kim, Sunwoo, Kim, Minje

Training personalized speech enhancement models is innately a no-shot learning problem due to privacy constraints and limited access to noise-free speech from the target user. If there is an abundance of unlabeled noisy speech from the test-time user

Externí odkaz: http://arxiv.org/abs/2104.02018

Zobrazit plný text záznamu

Report

Efficient Personalized Speech Enhancement through Self-Supervised Learning

Autor: Sivaraman, Aswin, Kim, Minje

This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their enhancement fun

Externí odkaz: http://arxiv.org/abs/2104.02017

Zobrazit plný text záznamu

Report

Detecting Extraneous Content in Podcasts

Autor: Reddy, Sravana, Yu, Yongze, Pappu, Aasish, Sivaraman, Aswin, Rezapour, Rezvaneh, Jones, Rosie

Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect suc

Externí odkaz: http://arxiv.org/abs/2103.02585

Zobrazit plný text záznamu

Report

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

Autor: Sivaraman, Aswin, Kim, Minje

This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to cleaning rec

Externí odkaz: http://arxiv.org/abs/2011.03426

Zobrazit plný text záznamu

Report

Sparse Mixture of Local Experts for Efficient Speech Enhancement

Autor: Sivaraman, Aswin, Kim, Minje

Publikováno v: Published in Interspeech 2020

In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlapping subproblems and introducing a classifier, we are a

Externí odkaz: http://arxiv.org/abs/2005.08128

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání