Zobrazeno 1 - 10
of 22
pro vyhledávání: '"Bakhturina, Evelina"'
Autor:
Xu, Peng, Ping, Wei, Wu, Xianchao, McAfee, Lawrence, Zhu, Chen, Liu, Zihan, Subramanian, Sandeep, Bakhturina, Evelina, Shoeybi, Mohammad, Catanzaro, Bryan
Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, whi
Externí odkaz:
http://arxiv.org/abs/2310.03025
Autor:
Meister, Aleksandr, Novikov, Matvei, Karpov, Nikolay, Bakhturina, Evelina, Lavrukhin, Vitaly, Ginsburg, Boris
Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format. Simultaneously, t
Externí odkaz:
http://arxiv.org/abs/2310.02943
Autor:
Zhang, Yang, Bartley, Travis M., Graterol-Fuenmayor, Mariana, Lavrukhin, Vitaly, Bakhturina, Evelina, Ginsburg, Boris
Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models (LLM) for tex
Externí odkaz:
http://arxiv.org/abs/2309.13426
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition (ASR) quality given user vocabulary. To deal with large user vocabularies, most of these models include candidate retrieval mechanisms,
Externí odkaz:
http://arxiv.org/abs/2306.02317
Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-speech (TTS) pipeline. However, G2P conversion is difficult for languages that contain heteronyms -- words that have one spelling but can be pronounced in multiple ways. G2P datas
Externí odkaz:
http://arxiv.org/abs/2302.14523
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms. One can c
Externí odkaz:
http://arxiv.org/abs/2208.00064
Text normalization (TN) systems in production are largely rule-based using weighted finite-state transducers (WFST). However, WFST-based systems struggle with ambiguous input when the normalized form is context-dependent. On the other hand, neural te
Externí odkaz:
http://arxiv.org/abs/2203.15917
Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, rangi
Externí odkaz:
http://arxiv.org/abs/2108.09889
Dialogue state tracking is an essential part of goal-oriented dialogue systems, while most of these state tracking models often fail to handle unseen services. In this paper, we propose SGD-QA, a simple and extensible model for schema-guided dialogue
Externí odkaz:
http://arxiv.org/abs/2105.08049
Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted finite-state transduc
Externí odkaz:
http://arxiv.org/abs/2104.05055