Autoregressive Parameter Estimation with DNN-based Pre-processing
Autor: | Zihao Cui, Mads Grasboll Christensen, Changchun Bao, Jesper Kjar Nielsen |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Artificial neural network
business.industry Computer science Recursion (computer science) Pattern recognition 02 engineering and technology Speech processing Signal 030507 speech-language pathology & audiology 03 medical and health sciences Autoregressive model Distortion generalized analysis-by-synthesis 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science Divergence (statistics) business Encoder Auto-regressive model DNN Levinson-Durbin recursion |
Zdroj: | Cui, Z, Bao, C, Nielsen, J K & Christensen, M G 2020, Autoregressive Parameter Estimation with DNN-based Pre-processing . in Proceedings of the International Conference on Acousics, Speech, and Signal Processing ., 9053755, IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 6759-6763, ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04/05/2020 . https://doi.org/10.1109/ICASSP40776.2020.9053755 ICASSP |
DOI: | 10.1109/ICASSP40776.2020.9053755 |
Popis: | In this paper, a method for estimating the autoregressive parameters from a signal segment is proposed. The method is based on a deep neural network (DNN) in combination with the classical Levinson-Durbin recursion (LDR). The DNN acts as a pre-processor for the LDR and can be trained on different metrics commonly encountered in speech processing using a generalized analysis-by-synthesis (GABS) structure where the LDR acts as the encoder. Unlike end-to-end data-driven approaches, this structure ensures that the DNN is easy to train and initialize since the DNN only has to learn a simple mapping. The results confirm this and show that the proposed method produces an AR-spectrum that efficiently represents the speech spectrum in terms of the Itakura-Saito divergence, Kullback-Leibler divergence, log-spectral distortion, and speech distortion. |
Databáze: | OpenAIRE |
Externí odkaz: |