Reducing speech recognition costs: By compressing the input data

Autor: Ramin Halavati, Saeed Bagheri Shouraki
Rok vydání: 2012
Předmět:
Zdroj: IEEE Conf. of Intelligent Systems
Popis: One of the key constraints of using embedded speech recognition modules is the required computational power. To decrease this requirement, we propose an algorithm that clusters the speech signal before passing it to the recognition units. The algorithm is based on agglomerative clustering and produces a sequence of compressed frames, optimized for recognition. Our experimental results indicate that the proposed method presents a frame rate with average 40 frames per second on medium to large vocabulary isolated word recognition tasks without loss of recognition accuracy which result in up to 60% faster recognition in compare to usual 100 fps fixed frame rate sampling. This value is quite close to the theoretically optimal value of 37.5 frames per second while the best result of former approaches is about 60 frames per second.
Databáze: OpenAIRE