Výsledky vyhledávání - "Jean-Marc Valin"

Multi-Channel Opus Compression for Far-Field Automatic Speech Recognition with a Fixed Bitrate Budget

Autor: Jahn Heymann, Lukas Drude, Jean-Marc Valin, Andreas Schwarz

Publikováno v: Interspeech 2021.

Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing. However, it also adds an inherent latency due to the transmission of the au

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::35f43c464484ccde233e9b6fdd8a0bdd
https://doi.org/10.21437/interspeech.2021-1214

Zobrazit plný text záznamu

Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On Percepnet

Autor: Arvindh Krishnaswamy, Karim Helwani, Jean-Marc Valin, Umut Isik, Srikanth V. Tenneti

Publikováno v: ICASSP

Speech enhancement algorithms based on deep learning have greatly surpassed their traditional counterparts and are now being considered for the task of removing acoustic echo from hands-free communication systems. This is a challenging problem due to

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a4089fd94633a9ba94ff67d6c3ae3b81
https://doi.org/10.1109/icassp39728.2021.9414140

Zobrazit plný text záznamu

Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Autor: Ritwik Giri, Jonah Casebeer, Arvindh Krishnaswamy, Jean-Marc Valin, Vinjai Vale, Umut Isik

Publikováno v: ICASSP

Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output. However, these models are tightly coupled with speech content, and p

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::44339601571ae00aff4c53c7d151d616

Zobrazit plný text záznamu

Personalized PercepNet: Real-time, Low-complexity Target Voice Separation and Enhancement

Autor: Umut Isik, Ritwik Giri, Jean-Marc Valin, Shrikant Venkataramani, Arvindh Krishnaswamy

The presence of multiple talkers in the surrounding environment poses a difficult challenge for real-time speech communication systems considering the constraints on network size and complexity. In this paper, we present Personalized PercepNet, a rea

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ba4c3acfb95325528049f11992ab1230

Zobrazit plný text záznamu

Semi-Supervised Singing Voice Separation with Noisy Self-Training

Autor: Zhepei Wang, Ritwik Giri, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy

Publikováno v: ICASSP

Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of ground-truth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a6874112c68cd4bf715846a9ea0d28d7

Zobrazit plný text záznamu

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Autor: Ritwik Giri, Jean-Marc Valin, Karim Helwani, Umut Isik, Neerad Phansalkar, Arvindh Krishnaswamy

Publikováno v: INTERSPEECH

Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier t

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4c4958fdadedc7a08263bb8cd26b9671
https://doi.org/10.21437/interspeech.2020-2730

Zobrazit plný text záznamu

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Autor: Arvindh Krishnaswamy, Karim Helwani, Umut Isik, Neerad Phansalkar, Ritwik Giri, Jean-Marc Valin

Publikováno v: INTERSPEECH

Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in train

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1f0954a885e6ab74a755411ca526b032
http://arxiv.org/abs/2008.04470

Zobrazit plný text záznamu

An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media

Publikováno v: APSIPA Transactions on Signal and Information Processing. 9

In 2018, the Alliance for Open Media (AOMedia) finalized its first video compression format AV1, which is jointly developed by the industry consortium of leading video technology companies. The main goal of AV1 is to provide an open source and royalt

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::bfc83cce90106243ec916dec4356142b
https://doi.org/10.1017/atsip.2020.2

Zobrazit plný text záznamu

A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet

Autor: Jean-Marc Valin, Jan Skoglund

Publikováno v: INTERSPEECH

Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we present a low-bi

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::92d8c3902915dc1d2346936e2fe763b2
https://doi.org/10.21437/interspeech.2019-1255

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání