Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Laptev, Aleksandr"'
Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicatin
Externí odkaz:
http://arxiv.org/abs/2406.07096
Autor:
Park, Tae Jin, Huang, He, Jukic, Ante, Dhawan, Kunal, Puvvada, Krishna C., Koluguri, Nithin, Karpov, Nikolay, Laptev, Aleksandr, Balam, Jagadeesh, Ginsburg, Boris
Publikováno v:
CHiME-7 Workshop 2023
We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored t
Externí odkaz:
http://arxiv.org/abs/2310.12378
The number of end-to-end speech recognition models grows every year. These models are often adapted to new domains or languages resulting in a proliferation of expert systems that achieve great results on target data, while generally showing inferior
Externí odkaz:
http://arxiv.org/abs/2306.15824
This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug
Externí odkaz:
http://arxiv.org/abs/2303.10384
Autor:
Laptev, Aleksandr, Ginsburg, Boris
This paper presents a class of new fast non-trainable entropy-based confidence estimation methods for automatic speech recognition. We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per
Externí odkaz:
http://arxiv.org/abs/2212.08703
This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in whi
Externí odkaz:
http://arxiv.org/abs/2110.03098
Autor:
Mitrofanov, Anton, Korenevskaya, Mariya, Podluzhny, Ivan, Khokhlov, Yuri, Laptev, Aleksandr, Andrusenko, Andrei, Ilin, Aleksei, Korenevsky, Maxim, Medennikov, Ivan, Romanenko, Aleksei
Neural network-based language models are commonly used in rescoring approaches to improve the quality of modern automatic speech recognition (ASR) systems. Most of the existing methods are computationally expensive since they use autoregressive langu
Externí odkaz:
http://arxiv.org/abs/2104.02526
Autor:
Laptev, Aleksandr, Andrusenko, Andrei, Podluzhny, Ivan, Mitrofanov, Anton, Medennikov, Ivan, Matveev, Yuri
With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recogniti
Externí odkaz:
http://arxiv.org/abs/2103.07186
This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transd
Externí odkaz:
http://arxiv.org/abs/2006.08274
Autor:
Laptev, Aleksandr, Korostik, Roman, Svischev, Aleksey, Andrusenko, Andrei, Medennikov, Ivan, Rybin, Sergey
Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (t
Externí odkaz:
http://arxiv.org/abs/2005.07157