Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation
Autor: | Hyeongju Kim, Hyung Yong Kim, Nam Soo Kim, Woo Hyun Kang, Hyeonseung Lee |
---|---|
Rok vydání: | 2020 |
Předmět: |
Scheme (programming language)
FOS: Computer and information sciences Sound (cs.SD) Computer science business.industry Speech recognition Deep learning Noise reduction Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Density estimation Computer Science - Sound Front and back ends Speech enhancement Flow (mathematics) Computer Science::Sound Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Artificial intelligence Joint (audio engineering) business computer Electrical Engineering and Systems Science - Audio and Speech Processing computer.programming_language |
Zdroj: | IJCAI |
DOI: | 10.48550/arxiv.2007.12903 |
Popis: | For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective. Comment: 7 pages, 3 figures |
Databáze: | OpenAIRE |
Externí odkaz: |