Streaming Punctuation for Long-form Dictation with Transformers

Autor:	Behre, Piyush, Tan, Sharman, Varadharajan, Padma, Chang, Shuangyu
Rok vydání:	2022
Předmět:	Computer Science - Computation and Language
Zdroj:	8th International Conference on Signal, Image Processing and Embedded Systems (SIGEM 2022), Volume 12, Number 20, November 2022
Druh dokumentu:	Working Paper
Popis:	While speech recognition Word Error Rate (WER) has reached human parity for English, long-form dictation scenarios still suffer from segmentation and punctuation problems resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. The new system tackles over-segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEU-score improvement of 0.66 for the downstream task of Machine Translation (MT).
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2210.05756 Zobrazit plný text záznamu View this record from Arxiv