Zobrazeno 1 - 10
of 64
pro vyhledávání: '"Wisdom, Scott"'
Autor:
Dementyev, Artem, Reddy, Chandan K. A., Wisdom, Scott, Chatlani, Navin, Hershey, John R., Lyon, Richard F.
Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement
Externí odkaz:
http://arxiv.org/abs/2409.18239
Autor:
Leglaive, Simon, Fraticelli, Matthieu, ElGhazaly, Hend, Borne, Léonie, Sadeghi, Mostafa, Wisdom, Scott, Pariente, Manuel, Hershey, John R., Pressnitzer, Daniel, Barker, Jon P.
Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This
Externí odkaz:
http://arxiv.org/abs/2402.01413
Autor:
Erdogan, Hakan, Wisdom, Scott, Chang, Xuankai, Borsos, Zalán, Tagliasacchi, Marco, Zeghidour, Neil, Hershey, John R.
We present TokenSplit, a speech separation model that acts on discrete token sequences. The model is trained on multiple tasks simultaneously: separate and transcribe each speech source, and generate speech from text. The model operates on transcript
Externí odkaz:
http://arxiv.org/abs/2308.10415
Autor:
Leglaive, Simon, Borne, Léonie, Tzinis, Efthymios, Sadeghi, Mostafa, Fraticelli, Matthieu, Wisdom, Scott, Pariente, Manuel, Pressnitzer, Daniel, Hershey, John R.
Publikováno v:
The 7th International Workshop on Speech Processing in Everyday Environments (CHiME), Dublin, Ireland, 2023
Supervised speech enhancement models are trained using artificially generated mixtures of clean speech and noise signals, which may not match real-world recording conditions at test time. This mismatch can lead to poor performance if the test domain
Externí odkaz:
http://arxiv.org/abs/2307.03533
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work generalizes the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-chann
Externí odkaz:
http://arxiv.org/abs/2305.11151
In a range of recent works, object-centric architectures have been shown to be suitable for unsupervised scene decomposition in the vision domain. Inspired by these methods we present AudioSlots, a slot-centric generative model for blind source separ
Externí odkaz:
http://arxiv.org/abs/2305.05591
We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify several lim
Externí odkaz:
http://arxiv.org/abs/2207.10141
We propose the novel task of distance-based sound separation, where sounds are separated based only on their distance from a single microphone. In the context of assisted listening devices, proximity provides a simple criterion for sound selection in
Externí odkaz:
http://arxiv.org/abs/2207.00562
Autor:
Kilgour, Kevin, Gfeller, Beat, Huang, Qingqing, Jansen, Aren, Wisdom, Scott, Tagliasacchi, Marco
We propose a method of separating a desired sound source from a single-channel mixture, based on either a textual description or a short audio sample of the target source. This is achieved by combining two distinct models. The first model, SoundWords
Externí odkaz:
http://arxiv.org/abs/2204.05738
Autor:
Muckenhirn, Hannah, Safin, Aleksandr, Erdogan, Hakan, Quitry, Felix de Chaumont, Tagliasacchi, Marco, Wisdom, Scott, Hershey, John R.
Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance. The main limitation of this approach is that such models can only be trained on large a
Externí odkaz:
http://arxiv.org/abs/2203.15652