Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Soumi Maiti"'
Autor:
Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Multilingual Automatic Speech Recognition (ASR) models have extended the usability of speech technologies to a wide variety of languages. With how many languages these models have to handle, however, a key to understanding their imbalanced performanc
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::54c5280273c456ca7eb9e9eea5461fec
Autor:
Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe
Publikováno v:
Interspeech 2022.
In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, s
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d180dc2bec745d232e25c3f5eccf8452
While human evaluation is the most reliable metric for evaluating speech generation systems, it is generally costly and time-consuming. Previous studies on automatic speech quality assessment address the problem by predicting human evaluation scores
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7967b940812b5bdf7db054acfcf93aa1
Publikováno v:
ICASSP
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::615b36526c772878d3560d71bb6cf9e7
Autor:
Michael I. Mandel, Soumi Maiti
Publikováno v:
ICASSP
Traditional speech enhancement systems produce speech with compromised quality. Here we propose to use the high quality speech generation capability of neural vocoders for better quality speech enhancement. We term this parametric resynthesis (PR). I
Publikováno v:
ICASSP
We present progress towards bilingual Text-to-Speech which is able to transform a monolingual voice to speak a second language while preserving speaker voice quality. We demonstrate that a bilingual speaker embedding space contains a separate distrib
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::698c74e9e8a040e13a94e5597f52a908
http://arxiv.org/abs/2004.04972
http://arxiv.org/abs/2004.04972
Publikováno v:
Journal of Evolution of Medical and Dental Sciences. 6:2422-2427
Autor:
Soumi Maiti, Michael I. Mandel
Publikováno v:
ICASSP
This work proposes the use of clean speech vocoder parameters as the target for a neural network performing speech enhancement. These parameters have been designed for text-to-speech synthesis so that they both produce high-quality resyntheses and al
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::81363d29e1f4232b347d2b7fb643bb05