Výsledky vyhledávání - "Olivier Siohan"

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires solving a

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87efeea46b9885a3788208cfa8ca2af8
http://arxiv.org/abs/2205.05206

Zobrazit plný text záznamu

A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection

Autor: Olivier Siohan, Otavio Braga

Publikováno v: ICASSP

Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the audio, and sele

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::67377d966ff66617eabdf0738f0dba76

Zobrazit plný text záznamu

End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

Autor: Hank Liao, Otavio Braga, Takaki Makino, Olivier Siohan

Publikováno v: ICASSP

Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching the audio. However, in a more realistic setting, when multiple faces are potentially on sc

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b95f7cdbb91946f6eaf35b8beb634cef

Zobrazit plný text záznamu

Action Item Detection in Meetings Using Pretrained Transformers

Autor: Kishan Sachdeva, Joshua Maynez, Olivier Siohan

Publikováno v: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::b09b009e22c19915a08964a71e50912f
https://doi.org/10.1109/asru51503.2021.9688167

Zobrazit plný text záznamu

End-to-End Audio-Visual Speech Recognition for Overlapping Speech

Autor: Richard Rose, Olivier Siohan, Anshuman Tripathi, Otavio Braga

Publikováno v: Interspeech 2021.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::393192b5a7d291772556920b808a30cf
https://doi.org/10.21437/interspeech.2021-1621

Zobrazit plný text záznamu

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

Autor: Basilio Garcia, Hank Liao, Olivier Siohan, Otavio Braga, Takaki Makino, Brendan Shillingford, Yannis M. Assael

Publikováno v: ASRU

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utteran

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8190bcd58f1c0490fa602e6fa162c6e4

Zobrazit plný text záznamu

Acoustic Modeling for Google Home

Publikováno v: INTERSPEECH

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::4c81b81dbb4343fdbb34ac3549f2b60d
https://doi.org/10.21437/interspeech.2017-234

Zobrazit plný text záznamu

Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training

Autor: Olivier Siohan, Vijayaditya Peddinti, Arun Narayanan, Tara N. Sainath

Publikováno v: INTERSPEECH

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::dc727149f6b21d78d2ca2ee83b47b64d
https://doi.org/10.21437/interspeech.2017-231

Zobrazit plný text záznamu

CTC Training of Multi-Phone Acoustic Models for Speech Recognition

Autor: Olivier Siohan

Publikováno v: INTERSPEECH

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::965f96eacb3245a403cd0df172dcd105
https://doi.org/10.21437/interspeech.2017-505

Zobrazit plný text záznamu

Automatic optimization of data perturbation distributions for multi-style training in speech recognition

Autor: Olivier Siohan, Mortaza Doulaty, Richard Rose

Publikováno v: SLT

Speech recognition performance using deep neural network based acoustic models is known to degrade when the acoustic environment and the speaker population in the target utterances are significantly different from the conditions represented in the tr

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::11f2baf5ce8a604559272cadb21abbf9
https://doi.org/10.1109/slt.2016.7846240

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání