Analysis of the influence of sound signal processing parameters on the quality voice command recognition

Autor:	Dyuzhayev, L. P., Koval, V. Yu.
Jazyk:	ukrajinština
Rok vydání:	2014
Předmět:	мел-кепстральні коефіцієнти mel-cepstral coefficients 681.58 распознавания речи dynamic time warping голосові команди speech recognition voice commands мел-кепстральные коэффициенты динамічне викривлення часу розпізнавання мови голосовые команды динамическое искажение времени
Zdroj:	Вісник НТУУ «КПІ». Радіотехніка, радіоапаратобудування: збірник наукових праць
Popis:	В роботі розглянуто структуру системи розпізнавання голосових команд, алгоритм виділення мел-кепстральних коефіцієнтів та їх порівняння методом динамічного викривлення часу. В системі зі словником з п’ятдесяти команд вимовлених одним диктором було досліджено вплив на якість розпізнавання голосової команди таких параметрів як: частоти дискретизації, тривалості фрейму, кількості вибірок Фур’є, виду віконної функції на якість розпізнавання голосової команди. Introduction. Recognition of single (isolated) voice commands for the task of voice control over different devices is required. Typically, this control method requires high reliability (at least 95% accuracy voice recognition). It should be noted that voice commands are often pronounced in high noisiness. All presently known methods and algorithms of speech recognition do not allow clearly to determine which parameters of sound signal can provide the best results. The main part. On the first level of voice recognition (preprocessing and extracting of acoustic features that have a number of useful features) they are easily calculated, providing a compact representation of the voice commands that are resistant to noise interference. On the next level given command is looked for in the reference dictionary. Input file has to be divided into frames to get MFCC coefficients. Each frame is measured by a window function and processed by discrete Fourier transform. The resulting representation of signal in the frequency domain is divided into ranges using a set of triangular filters. The last step is to perform discrete cosine transform. Method of dynamic time warping allows to get a value, inverse of degree of similarity between given command and a reference. Conclusions. Research has shown that in the field of voice commands recognition optimum results in terms of quality / performance can be achieved using the following parameters of sound signal processing:8 kHz sample rate, frame duration 70-120 ms, Hamming weighting function of a window, number of Fourier samples is 512. В работе рассмотрено структуру системы распознавания голосовых команд, алгоритм выделения мел-кепстральных коэффициентов и их сравнение методом динамического искажения времени. В системе со словарем из пятидесяти команд произнесенных одним диктором было исследовано влияние на качество распознавания голосовых команд таких параметров как: частота дискретизации, продолжительность фрейма, количество выборок Фурье, вид оконной функции.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od______2635::217028e412e485425f87de23774b67ab https://ela.kpi.ua/handle/123456789/8095 Zobrazit plný text záznamu