Zobrazeno 1 - 10
of 69
pro vyhledávání: '"Olivier Siohan"'
Autor:
Otavio Braga, Olivier Siohan
Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires solving a
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87efeea46b9885a3788208cfa8ca2af8
http://arxiv.org/abs/2205.05206
http://arxiv.org/abs/2205.05206
Autor:
Olivier Siohan, Otavio Braga
Publikováno v:
ICASSP
Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the audio, and sele
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::67377d966ff66617eabdf0738f0dba76
Publikováno v:
ICASSP
Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching the audio. However, in a more realistic setting, when multiple faces are potentially on sc
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b95f7cdbb91946f6eaf35b8beb634cef
Publikováno v:
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
Publikováno v:
Interspeech 2021.
Autor:
Basilio Garcia, Hank Liao, Olivier Siohan, Otavio Braga, Takaki Makino, Brendan Shillingford, Yannis M. Assael
Publikováno v:
ASRU
This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utteran
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8190bcd58f1c0490fa602e6fa162c6e4
Autor:
Kevin W. Wilson, Erik McDermott, Arun Narayanan, Olivier Siohan, Ananya Misra, Khe Chai Sim, Ehsan Variani, J. Caroselli, Izhak Shafran, Chanwoo Kim, Matt Shannon, Mitchel Weintraub, Ron Weiss, Michiel Bacchiani, Bo Li, Richard Rose, Golan Pundak, Hasim Sak, Tara N. Sainath, K. K. Chin
Publikováno v:
INTERSPEECH
Publikováno v:
INTERSPEECH
Autor:
Olivier Siohan
Publikováno v:
INTERSPEECH
Publikováno v:
SLT
Speech recognition performance using deep neural network based acoustic models is known to degrade when the acoustic environment and the speaker population in the target utterances are significantly different from the conditions represented in the tr