Autoregressive Parameter Estimation with DNN-based Pre-processing

Autor:	Zihao Cui, Mads Grasboll Christensen, Changchun Bao, Jesper Kjar Nielsen
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Artificial neural network business.industry Computer science Recursion (computer science) Pattern recognition 02 engineering and technology Speech processing Signal 030507 speech-language pathology & audiology 03 medical and health sciences Autoregressive model Distortion generalized analysis-by-synthesis 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science Divergence (statistics) business Encoder Auto-regressive model DNN Levinson-Durbin recursion
Zdroj:	Cui, Z, Bao, C, Nielsen, J K & Christensen, M G 2020, Autoregressive Parameter Estimation with DNN-based Pre-processing . in Proceedings of the International Conference on Acousics, Speech, and Signal Processing ., 9053755, IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 6759-6763, ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04/05/2020 . https://doi.org/10.1109/ICASSP40776.2020.9053755 ICASSP
DOI:	10.1109/ICASSP40776.2020.9053755
Popis:	In this paper, a method for estimating the autoregressive parameters from a signal segment is proposed. The method is based on a deep neural network (DNN) in combination with the classical Levinson-Durbin recursion (LDR). The DNN acts as a pre-processor for the LDR and can be trained on different metrics commonly encountered in speech processing using a generalized analysis-by-synthesis (GABS) structure where the LDR acts as the encoder. Unlike end-to-end data-driven approaches, this structure ensures that the DNN is easy to train and initialize since the DNN only has to learn a simple mapping. The results confirm this and show that the proposed method produces an AR-spectrum that efficiently represents the speech spectrum in terms of the Itakura-Saito divergence, Kullback-Leibler divergence, log-spectral distortion, and speech distortion.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b421e64c5d00d1cb5ce29f7f0b328024 https://vbn.aau.dk/da/publications/9f6296a9-49b4-4caf-843b-424a9bc0b281 Zobrazit plný text záznamu