Recent Developments on Espnet Toolkit Boosted By Conformer

Autor:	Tomoki Hayashi, Wangyou Zhang, Jing Shi, Hirofumi Inaguma, Daniel Garcia-Romero, Chenda Li, Xuankai Chang, Shinji Watanabe, Jiatong Shi, Kun Wei, Yuekai Zhang, Pengcheng Guo, Yosuke Higuchi, Naoyuki Kamo, Florian Boyer
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Sound (cs.SD) Signal processing Open source Audio and Speech Processing (eess.AS) Computer science Speech recognition Research community FOS: Electrical engineering electronic engineering information engineering Speech processing Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing Transformer (machine learning model)
Zdroj:	ICASSP
DOI:	10.1109/icassp39728.2021.9414858
Popis:	In this study, we present recent developments on ESPnet: End-to- End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end- to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa47a67cbbe301048a36e7e10e0659a2 https://doi.org/10.1109/icassp39728.2021.9414858 Zobrazit plný text záznamu