Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation

Autor:	Kobashikawa Satoshi, Yamaguchi Yoshikazu, Takahashi Satoshi, Masataki Hirokazu, Asami Taichi, Hori Takaaki
Rok vydání:	2012
Předmět:	Reduction (complexity) Matching (statistics) Beam diameter Voice activity detection Computational complexity theory Computer science Speech recognition Computation Slowness Decoding methods
Zdroj:	SLT
DOI:	10.1109/slt.2012.6424209
Popis:	This paper proposes a technique that efficiently controls the beam width to yield practical computation times when auto-transcribing massive volumes of speeches. We focus on the fact that a lot of time is wasted by recognizing poor quality speeches that will yield, with inordinate slowness, erroneous transcriptions and provide no useful results. To stabilize the time regardless of quality, our proposal controls the beam width based on prolonged score spread against the target speech; it formulates the score range within the width and maximizes computation efficiency by regulating the range relevant to the hypotheses' survival rate. The proposed technique can control the width rapidly by using just monophones prior to decoding. It also restricts the width in decoding by using the processing speed and remaining data time to better handle stubborn speeches. Experiments with several SNRs and actual call-center speeches confirm a reduction in computation time while matching the accuracy of existing techniques.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2a18215c38e4cfc8ef08100d9e66f905 https://doi.org/10.1109/slt.2012.6424209 Zobrazit plný text záznamu