Výsledky vyhledávání - "NEEKHARA, PAARTH"

Report

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference

Autor: Casanova, Edresson, Langman, Ryan, Neekhara, Paarth, Hussain, Shehzeen, Li, Jason, Ghosh, Subhankar, Jukić, Ante, Lee, Sang-gil

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data. However, audio codecs often operate at hig

Externí odkaz: http://arxiv.org/abs/2409.12117

Zobrazit plný text záznamu

Report

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Autor: Neekhara, Paarth, Hussain, Shehzeen, Ghosh, Subhankar, Li, Jason, Valle, Rafael, Badlani, Rohan, Ginsburg, Boris

Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated ou

Externí odkaz: http://arxiv.org/abs/2406.17957

Zobrazit plný text záznamu

Report

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models

Autor: Zhang, Ruisi, Hussain, Shehzeen Samarah, Neekhara, Paarth, Koushanfar, Farinaz

We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, enc

Externí odkaz: http://arxiv.org/abs/2310.12362

Zobrazit plný text záznamu

Report

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

Autor: Neekhara, Paarth, Hussain, Shehzeen, Valle, Rafael, Ginsburg, Boris, Ranjan, Rishabh, Dubnov, Shlomo, Koushanfar, Farinaz, McAuley, Julian

We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encod

Externí odkaz: http://arxiv.org/abs/2310.09653

Zobrazit plný text záznamu

Report

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations

Autor: Hussain, Shehzeen, Neekhara, Paarth, Huang, Jocelyn, Li, Jason, Ginsburg, Boris

In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. First, we develop a multi-task model to decompose a speech utterance into features such as linguistic content, speaker ch

Externí odkaz: http://arxiv.org/abs/2302.08137

Zobrazit plný text záznamu

Report

FastStamp: Accelerating Neural Steganography and Digital Watermarking of Images on FPGAs

Autor: Hussain, Shehzeen, Sheybani, Nojan, Neekhara, Paarth, Zhang, Xinqiao, Duarte, Javier, Koushanfar, Farinaz

Steganography and digital watermarking are the tasks of hiding recoverable data in image pixels. Deep neural network (DNN) based image steganography and watermarking techniques are quickly replacing traditional hand-engineered pipelines. DNN based wa

Externí odkaz: http://arxiv.org/abs/2209.12391

Zobrazit plný text záznamu

Report

ReFace: Real-time Adversarial Attacks on Face Recognition Systems

Autor: Hussain, Shehzeen, Huster, Todd, Mesterharm, Chris, Neekhara, Paarth, An, Kevin, Jere, Malhar, Sikka, Harshvardhan, Koushanfar, Farinaz

Deep neural network based face recognition models have been shown to be vulnerable to adversarial examples. However, many of the past attacks require the adversary to solve an input-dependent optimization problem using gradient descent which makes th

Externí odkaz: http://arxiv.org/abs/2206.04783

Zobrazit plný text záznamu

Report

FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes

Autor: Neekhara, Paarth, Hussain, Shehzeen, Zhang, Xinqiao, Huang, Ke, McAuley, Julian, Koushanfar, Farinaz

Deepfakes and manipulated media are becoming a prominent threat due to the recent advances in realistic image and video synthesis techniques. There have been several attempts at combating Deepfakes using machine learning classifiers. However, such cl

Externí odkaz: http://arxiv.org/abs/2204.01960

Zobrazit plný text záznamu

Report

Adapting TTS models For New Speakers using Transfer Learning

Autor: Neekhara, Paarth, Li, Jason, Ginsburg, Boris

Training neural text-to-speech (TTS) models for a new speaker typically requires several hours of high quality speech data. Prior works on voice cloning attempt to address this challenge by adapting pre-trained multi-speaker TTS models for a new voic

Externí odkaz: http://arxiv.org/abs/2110.05798

Zobrazit plný text záznamu

Report

WaveGuard: Understanding and Mitigating Audio Adversarial Examples

Autor: Hussain, Shehzeen, Neekhara, Paarth, Dubnov, Shlomo, McAuley, Julian, Koushanfar, Farinaz

There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems. These attacks pose new challenges to deep learning security and have raised significant concerns in deploying ASR systems in safet

Externí odkaz: http://arxiv.org/abs/2103.03344

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání