A fast approach to spoken term detection based on prosodic dynamic features

Autor:	Lei Wang, Xuejiao Tan
Rok vydání:	2015
Předmět:	Training set Computer science business.industry Speech recognition Template matching Feature vector Computation Gaussian Frame (networking) Process (computing) Pattern recognition Term (time) symbols.namesake symbols Artificial intelligence business
Zdroj:	2015 IEEE International Conference on Progress in Informatics and Computing (PIC).
DOI:	10.1109/pic.2015.7489917
Popis:	Model-based spoken term detection usually requires huge number of training data with annotation. When lacking enough training data, DTW-based method is a better choice. However, both the model-based and classical DTW-based methods are based on frame by frame template matching. The computation load is heavy and the search efficiency is poor. We propose a fast two-stage-frameworked approach to spoken term detection. Prosodic dynamic features are exploited to rapidly locate hypothesized spoken term regions in the first stage and Gaussian posteriorgrams are exploited to more precisely verify the local hypothesized regions in the second stage. Since each prosodic feature vector only contains three dimensions and represent several continuous frames speech at one time, we can realize segment-based instead of frame-based template matching to accelerate the whole keywords detection process. The two-stage method has fully exploited the long and short time characteristics of speeches. An experiment is conduced to demonstrate our method improves the speed and obtain similar detection performance under the same condition.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f5badfc653690daa51f1e71a779e1fbc https://doi.org/10.1109/pic.2015.7489917 Zobrazit plný text záznamu