ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

Autor:	Li, Chenda, Shi, Jing, Zhang, Wangyou, Subramanian, Aswin Shanmugam, Chang, Xuankai, Kamo, Naoyuki, Hira, Moto, Hayashi, Tomoki, Boeddeker, Christoph, Chen, Zhuo, Watanabe, Shinji
Rok vydání:	2020
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound
Druh dokumentu:	Working Paper
DOI:	10.1109/SLT48900.2021.9383615
Popis:	We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets. Comment: Accepted by SLT 2021
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2011.03706 Zobrazit plný text záznamu View this record from Arxiv