Spoken term detection from continuous speech using ANN posteriors and image processing techniques

Autor: Vikram C M, Arpit Jain, Ravi Shankar, Deepak K T, S R M Prasanna
Rok vydání: 2016
Předmět:
Zdroj: 2016 Twenty Second National Conference on Communication (NCC).
DOI: 10.1109/ncc.2016.7561151
Popis: The objective of current work is to demonstrate the significance of morphological image processing techniques in the spoken term detection from continuous speech. The phone posterior probabilities for the reference speech data and query word are obtained from the Hidden Markov Model (HMM)- Artificial Neural Network (ANN) based hybrid phoneme recognizer. The phone posteriors of query word and reference data are matched by using the non-segmental Dynamic Time Warping (DTW) technique. In order to make the decision about the presence or absence of a keyword in a particular reference file, image processing based approach is proposed. The DTW accumulation matrix is viewed as a gray scale image and processed using binarization and skeletonization operations. The decision about the presence of keyword is taken by observing a diagonal streak of dark patch in the processed image. The phoneme recognizer is trained on the TIMIT training set and a set of twenty randomly chosen words from the TIMIT test data are considered as keywords. The algorithm is evaluated for each keyword against the entire TIMIT test data as the reference and an accuracy of about 85% with an error rate of less than 8% is noted.
Databáze: OpenAIRE