Výsledky vyhledávání - "Wisdom, Scott"

Report

Towards sub-millisecond latency real-time speech enhancement models on hearables

Autor: Dementyev, Artem, Reddy, Chandan K. A., Wisdom, Scott, Chatlani, Navin, Hershey, John R., Lyon, Richard F.

Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement

Externí odkaz: http://arxiv.org/abs/2409.18239

Zobrazit plný text záznamu

Report

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

Autor: Leglaive, Simon, Fraticelli, Matthieu, ElGhazaly, Hend, Borne, Léonie, Sadeghi, Mostafa, Wisdom, Scott, Pariente, Manuel, Hershey, John R., Pressnitzer, Daniel, Barker, Jon P.

Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This

Externí odkaz: http://arxiv.org/abs/2402.01413

Zobrazit plný text záznamu

Report

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

Autor: Erdogan, Hakan, Wisdom, Scott, Chang, Xuankai, Borsos, Zalán, Tagliasacchi, Marco, Zeghidour, Neil, Hershey, John R.

We present TokenSplit, a speech separation model that acts on discrete token sequences. The model is trained on multiple tasks simultaneously: separate and transcribe each speech source, and generate speech from text. The model operates on transcript

Externí odkaz: http://arxiv.org/abs/2308.10415

Zobrazit plný text záznamu

Report

The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement

Autor: Leglaive, Simon, Borne, Léonie, Tzinis, Efthymios, Sadeghi, Mostafa, Fraticelli, Matthieu, Wisdom, Scott, Pariente, Manuel, Pressnitzer, Daniel, Hershey, John R.

Publikováno v: The 7th International Workshop on Speech Processing in Everyday Environments (CHiME), Dublin, Ireland, 2023

Supervised speech enhancement models are trained using artificially generated mixtures of clean speech and noise signals, which may not match real-world recording conditions at test time. This mismatch can lead to poor performance if the test domain

Externí odkaz: http://arxiv.org/abs/2307.03533

Zobrazit plný text záznamu

Report

Unsupervised Multi-channel Separation and Adaptation

Autor: Han, Cong, Wilson, Kevin, Wisdom, Scott, Hershey, John R.

A key challenge in machine learning is to generalize from training data to an application domain of interest. This work generalizes the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-chann

Externí odkaz: http://arxiv.org/abs/2305.11151

Zobrazit plný text záznamu

Report

AudioSlots: A slot-centric generative model for audio separation

Autor: Reddy, Pradyumna, Wisdom, Scott, Greff, Klaus, Hershey, John R., Kipf, Thomas

In a range of recent works, object-centric architectures have been shown to be suitable for unsupervised scene decomposition in the vision domain. Inspired by these methods we present AudioSlots, a slot-centric generative model for blind source separ

Externí odkaz: http://arxiv.org/abs/2305.05591

Zobrazit plný text záznamu

Report

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

Autor: Tzinis, Efthymios, Wisdom, Scott, Remez, Tal, Hershey, John R.

We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify several lim

Externí odkaz: http://arxiv.org/abs/2207.10141

Zobrazit plný text záznamu

Report

Distance-Based Sound Separation

Autor: Patterson, Katharine, Wilson, Kevin, Wisdom, Scott, Hershey, John R.

We propose the novel task of distance-based sound separation, where sounds are separated based only on their distance from a single microphone. In the context of assisted listening devices, proximity provides a simple criterion for sound selection in

Externí odkaz: http://arxiv.org/abs/2207.00562

Zobrazit plný text záznamu

Report

Text-Driven Separation of Arbitrary Sounds

Autor: Kilgour, Kevin, Gfeller, Beat, Huang, Qingqing, Jansen, Aren, Wisdom, Scott, Tagliasacchi, Marco

We propose a method of separating a desired sound source from a single-channel mixture, based on either a textual description or a short audio sample of the target source. This is achieved by combining two distinct models. The first model, SoundWords

Externí odkaz: http://arxiv.org/abs/2204.05738

Zobrazit plný text záznamu

Report

CycleGAN-Based Unpaired Speech Dereverberation

Autor: Muckenhirn, Hannah, Safin, Aleksandr, Erdogan, Hakan, Quitry, Felix de Chaumont, Tagliasacchi, Marco, Wisdom, Scott, Hershey, John R.

Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance. The main limitation of this approach is that such models can only be trained on large a

Externí odkaz: http://arxiv.org/abs/2203.15652

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání