Rank-Index based posteriorgram representation for isolated word recognition
Autor: | K. R. Prasanna Kumar, Pabitra Mitra, Anupam Mandal |
---|---|
Rok vydání: | 2016 |
Předmět: |
Dynamic time warping
Matching (graph theory) Computer science business.industry Gaussian Speech recognition Posterior probability Rank (computer programming) Pattern recognition Electronic mail symbols.namesake ComputingMethodologies_PATTERNRECOGNITION Word recognition symbols Artificial intelligence Representation (mathematics) business |
Zdroj: | 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). |
Popis: | This paper examines recognition performance of approaches using Dynamic Time Warping (DTW) on phonetic and Gaussian posteriorgram templates for isolated spoken word recognition task. The performance is assessed on two parameters: i) recognition accuracy and ii) time taken for execution. It is observed that the approaches using phonetic posteriorgram templates exhibit higher recognition accuracy compared to those using Gaussian posteriorgram templates. However, the execution time in both cases are comparable. Further, the paper proposes a new rank-index based representation of posteriorgram templates based on indices of top components derived from a ranked list of posterior probability values. The rank-index based representation exhibits higher recognition performance compared to the representation based on posterior probability values. However, the time taken in case of former is almost 45 times more than the later on the same isolated word recognition task. The paper also proposes a non-dynamic time warping (non-DTW) based approach for modeling and matching spoken words based on rank-index posteriorgram templates. The proposed approach involves unsupervised segmentation of rank-index posteriorgram templates followed by modeling of the obtained segments. This approach reduces the execution time compared to the approaches using DTW by ∼98% on rank-index based representation and ∼31% on representation based on posterior probability values, while exhibiting recognition accuracy close to the later in the phonetic posteriorgram case. |
Databáze: | OpenAIRE |
Externí odkaz: |