Spoken term detection from continuous speech using ANN posteriors and image processing techniques
Autor: | Vikram C M, Arpit Jain, Ravi Shankar, Deepak K T, S R M Prasanna |
---|---|
Rok vydání: | 2016 |
Předmět: |
Dynamic time warping
Computer science business.industry Speech recognition ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Word error rate Image processing TIMIT Pattern recognition Mixture model ComputingMethodologies_PATTERNRECOGNITION Mel-frequency cepstrum Artificial intelligence Hidden Markov model business Test data |
Zdroj: | 2016 Twenty Second National Conference on Communication (NCC). |
DOI: | 10.1109/ncc.2016.7561151 |
Popis: | The objective of current work is to demonstrate the significance of morphological image processing techniques in the spoken term detection from continuous speech. The phone posterior probabilities for the reference speech data and query word are obtained from the Hidden Markov Model (HMM)- Artificial Neural Network (ANN) based hybrid phoneme recognizer. The phone posteriors of query word and reference data are matched by using the non-segmental Dynamic Time Warping (DTW) technique. In order to make the decision about the presence or absence of a keyword in a particular reference file, image processing based approach is proposed. The DTW accumulation matrix is viewed as a gray scale image and processed using binarization and skeletonization operations. The decision about the presence of keyword is taken by observing a diagonal streak of dark patch in the processed image. The phoneme recognizer is trained on the TIMIT training set and a set of twenty randomly chosen words from the TIMIT test data are considered as keywords. The algorithm is evaluated for each keyword against the entire TIMIT test data as the reference and an accuracy of about 85% with an error rate of less than 8% is noted. |
Databáze: | OpenAIRE |
Externí odkaz: |