Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Autor: M. S. Arun Sankar, P. S. Sathidevi
Rok vydání: 2019
Předmět:
Zdroj: Arabian Journal for Science and Engineering. 45:1785-1801
ISSN: 2191-4281
2193-567X
DOI: 10.1007/s13369-019-04273-z
Popis: In this paper, we propose a variable-bit-rate speech codec-based on mixed excitation linear prediction enhanced (MELPe) with an average bit rate of 2 kbps and with a better representation of excitation signal. The order of the prediction filter in MELPe coding architecture is reduced from 10 to 7 without affecting the perceptual quality of the decoded speech by using psychoacoustic Mel scale. An efficient two-split vector quantization is developed with weighted Euclidean distance measure for Mel scale-based linear predictive coding (Mel-LPC), and it requires only 18 bits/frame. The instantaneous pitch or epoch that is vital for many speech processing applications is preserved in this codec by including it in the excitation signal used for reconstructing the voiced speech. The quantization scheme developed for glottal closure instants (GCIs) causes an increase in the bit requirement for voiced frames by 4–25 bits depending on the position of GCIs. To compensate for that, the Mel-LPC order for both silence and unvoiced frames has been brought down to 4 without compromising the perceptual quality of reconstructed speech. The lowered bit budget for unvoiced frame is 41 bits/frame, and for silence, it is 31 bits/frame. Further reduction of 10 bits for silence frame is obtained by reducing the number of transmitted parameters and by tuning the quantization bit requirement for each. For categorizing the speech frames at the entry of the encoder, a neural network-based voiced/unvoiced/silence classification algorithm using five-dimensional feature set is created. The experimental results show that the proposed coding scheme operates at an average bit rate of 2 kbps, which is less than the bit rate of MELPe (2.4 kbps), but with a better perceptual score. In addition to all these, the incorporation of Mel-LPC gives a better performance in the estimation of formants and GCIs.
Databáze: OpenAIRE