Zobrazeno 1 - 10
of 55
pro vyhledávání: '"Jean-Marc Valin"'
Autor:
Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy
Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixin
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2ff149508d14b91886a1b51604c0f42e
http://arxiv.org/abs/2203.15092
http://arxiv.org/abs/2203.15092
Publikováno v:
Interspeech 2021.
Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing. However, it also adds an inherent latency due to the transmission of the au
Publikováno v:
ICASSP
Speech enhancement algorithms based on deep learning have greatly surpassed their traditional counterparts and are now being considered for the task of removing acoustic echo from hands-free communication systems. This is a challenging problem due to
Publikováno v:
ICASSP
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output. However, these models are tightly coupled with speech content, and p
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::44339601571ae00aff4c53c7d151d616
The presence of multiple talkers in the surrounding environment poses a difficult challenge for real-time speech communication systems considering the constraints on network size and complexity. In this paper, we present Personalized PercepNet, a rea
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ba4c3acfb95325528049f11992ab1230
Publikováno v:
ICASSP
Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of ground-truth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a6874112c68cd4bf715846a9ea0d28d7
Autor:
Ritwik Giri, Jean-Marc Valin, Karim Helwani, Umut Isik, Neerad Phansalkar, Arvindh Krishnaswamy
Publikováno v:
INTERSPEECH
Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier t
Autor:
Arvindh Krishnaswamy, Karim Helwani, Umut Isik, Neerad Phansalkar, Ritwik Giri, Jean-Marc Valin
Publikováno v:
INTERSPEECH
Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in train
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1f0954a885e6ab74a755411ca526b032
http://arxiv.org/abs/2008.04470
http://arxiv.org/abs/2008.04470
Autor:
Nathan E. Egge, Urvang Joshi, Yaowu Xu, Andrey Norkin, Sarah Parker, Yunqing Wang, Jim Bankoski, Paul Wilkins, Jean-Marc Valin, Steinar Midtskogen, Yue Chen, Thomas Davies, Zoe Liu, Peter de Rivaz, Luc Trudeau, Jingning Han, Debargha Mukherjee, Cheng Chen, Hui Su, Ching-Han Chiang, Adrian Grange
Publikováno v:
APSIPA Transactions on Signal and Information Processing. 9
In 2018, the Alliance for Open Media (AOMedia) finalized its first video compression format AV1, which is jointly developed by the industry consortium of leading video technology companies. The main goal of AV1 is to provide an open source and royalt
Autor:
Jean-Marc Valin, Jan Skoglund
Publikováno v:
INTERSPEECH
Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we present a low-bi