A Study on Speech Signal Processing Using Wavelet Transforms

Autor: Shi-Huang Chen, 陳璽煌
Rok vydání: 2002
Druh dokumentu: 學位論文 ; thesis
Popis: 90
Wavelet transform and its theory is one of the most exciting developments in the last decade. In fact, the wavelet transform has been developed independently for various fields such as signal processing, image processing, audio processing, communication, and applied mathematics. Due to the wavelet representation has characteristics of the efficient time-frequency localization and the multi-resolution analysis, the wavelet transforms are suitable for processing the non-stationary signals such as speech. Therefore, this thesis focuses on the study of wavelet-based speech signal processing and proposes a framework of speech signal processing using wavelet transform. Based on the proposed framework, this thesis develops four new wavelet-based speech signal processing algorithms including pitch detection, consonant/vowel (C/V) segmentation, speech enhancement, and voice active detection (VAD). Furthermore, in order to cancel out the aliasing distortion arose in the filterbank structure of wavelet transforms, this thesis also proposes an aliasing compensation algorithm to overcome this problem. The first part illustrated in this thesis is the wavelet-based pitch detection algorithm. This thesis applies the aliasing compensated wavelet transform and the modified spatial correlation function to improve the robustness of conventional pitch detection algorithms under noisy environments. Experimental results show the proposed pitch detection algorithm has the better performance than those of conventional algorithms no matter under clear or noisy environments. The second part of this thesis presents the wavelet-based C/V segmentation algorithm. This novel algorithm can directly detect the C/V segmentation point by the use of the product function and its energy profile. In comparison with conventional C/V segmentation algorithms, the proposed algorithm is no need to use pitch detector as well as backward processing. As a consequence, the accuracy of the proposed C/V segmentation algorithm can be increased substantially from those of conventional approaches. In the third part, this thesis proposes a wavelet-based speech enhancement method based on the perceptual wavelet packet decomposition (PWPD) and the time-adapted thresholding (TAT) in order to increase the perceptual speech quality after enhancement processing. With these improved techniques, the over thresholding of speech segments which is usually occurred in conventional speech enhancement schemes can be avoided. In addition, the advantage of this improved method is that it does not require a complicated estimation of the noise level or any knowledge of the SNR. Using both additive and real noises, experimental results demonstrate that the speech enhancement method proposed in this thesis is capable of outperforming conventional noise cancellation schemes. Finally, this thesis further applies the TAT algorithm developed in the third part to the application of VAD. This new wavelet-based VAD method also has the advantage that it needs not a complicated estimation of the noise level or any knowledge of the SNR. Experimental results show this new type of VAD method has an accurate detection rate even through the speech signal is seriously contaminated by the background noise.
Databáze: Networked Digital Library of Theses & Dissertations