Výsledky vyhledávání - "Takahashi, Naoya"

Report

LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking

Autor: Singh, Mayank Kumar, Takahashi, Naoya, Liao, Wei-Hsiang, Mitsufuji, Yuki

This paper presents a novel approach to deter unauthorized deepfakes and enable user tracking in generative models, even when the user has full access to the model parameters, by integrating key-based model authentication with watermarking techniques

Externí odkaz: http://arxiv.org/abs/2409.07743

Zobrazit plný text záznamu

Report

SilentCipher: Deep Audio Watermarking

Autor: Singh, Mayank Kumar, Takahashi, Naoya, Liao, Weihsiang, Mitsufuji, Yuki

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and ro

Externí odkaz: http://arxiv.org/abs/2406.03822

Zobrazit plný text záznamu

Report

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Autor: Shimada, Kazuki, Politis, Archontis, Sudarsanam, Parthasaarathy, Krause, Daniel, Uchida, Kengo, Adavanne, Sharath, Hakala, Aapo, Koyama, Yuichiro, Takahashi, Naoya, Takahashi, Shusuke, Virtanen, Tuomas, Mitsufuji, Yuki

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of

Externí odkaz: http://arxiv.org/abs/2306.09126

Zobrazit plný text záznamu

Report

Iteratively Improving Speech Recognition and Voice Conversion

Autor: Singh, Mayank Kumar, Takahashi, Naoya, Naoyuki, Onoe

Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains

Externí odkaz: http://arxiv.org/abs/2305.15055

Zobrazit plný text záznamu

Report

The Whole Is Greater than the Sum of Its Parts: Improving Music Source Separation by Bridging Network

Autor: Sawata, Ryosuke, Takahashi, Naoya, Uhlich, Stefan, Takahashi, Shusuke, Mitsufuji, Yuki

Publikováno v: EURASIP Journal on Audio, Speech, and Music Processing (JASM), 39 (2024)

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL)

Externí odkaz: http://arxiv.org/abs/2305.07855

Zobrazit plný text záznamu

Report

Cross-modal Face- and Voice-style Transfer

Autor: Takahashi, Naoya, Singh, Mayank K., Mitsufuji, Yuki

Image-to-image translation and voice conversion enable the generation of a new facial image and voice while maintaining some of the semantics such as a pose in an image and linguistic content in audio, respectively. They can aid in the content-creati

Externí odkaz: http://arxiv.org/abs/2302.13838

Zobrazit plný text záznamu

Report

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

Autor: Shah, Nirmesh, Singh, Mayank Kumar, Takahashi, Naoya, Onoe, Naoyuki

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emot

Externí odkaz: http://arxiv.org/abs/2302.10536

Zobrazit plný text záznamu

Report

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Autor: Dong, Hao-Wen, Takahashi, Naoya, Mitsufuji, Yuki, McAuley, Julian, Berg-Kirkpatrick, Taylor

Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio m

Externí odkaz: http://arxiv.org/abs/2212.07065

Zobrazit plný text záznamu

Report

Robust One-Shot Singing Voice Conversion

Autor: Takahashi, Naoya, Singh, Mayank Kumar, Mitsufuji, Yuki

Recent progress in deep generative models has improved the quality of voice conversion in the speech domain. However, high-quality singing voice conversion (SVC) of unseen singers remains challenging due to the wider variety of musical expressions in

Externí odkaz: http://arxiv.org/abs/2210.11096

Zobrazit plný text záznamu

Report

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Autor: Takahashi, Naoya, Kumar, Mayank, Singh, Mitsufuji, Yuki

Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunc

Externí odkaz: http://arxiv.org/abs/2210.07508

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání