Zobrazeno 1 - 10
of 1 491
pro vyhledávání: '"Bataev, A. A."'
We present \textbf{H}ybrid-\textbf{A}utoregressive \textbf{IN}ference Tr\textbf{AN}sducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor networ
Externí odkaz:
http://arxiv.org/abs/2410.02597
Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicatin
Externí odkaz:
http://arxiv.org/abs/2406.07096
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over lab
Externí odkaz:
http://arxiv.org/abs/2406.06220
The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding. Current state-of-the-art RNN-T decoding implementations leave the GPU idle ~80% of the time. Leveraging a new CUDA 12.4 feature, CUDA graph conditional
Externí odkaz:
http://arxiv.org/abs/2406.03791
Autor:
Kulebyakina, Evgeniya V., Skorikov, Mikhail L., Kolobkova, Elena V., Kuznetsova, Maria S., Bataev, Matvei N., Yakovlev, Dmitri R., Belykh, Vasilii V.
Lead halide perovskite nanocrystals (NCs) in a glass matrix combine excellent optical properties and stability against environment. The spectral and temporal characteristics of photoluminescence from CsPbBr$_3$ and CsPb(Cl,Br)$_3$ nanocrystals (NCs)
Externí odkaz:
http://arxiv.org/abs/2312.16685
This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug
Externí odkaz:
http://arxiv.org/abs/2303.10384
Autor:
Bataev, I.A., Riabinkina, P.A., Emurlaev, K.I., Golovin, E.D., Lazurenko, D.V., Chen, P., Bataeva, Z.B., Ogneva, T.S., Nasennik, I.E., Bataev, A.A.
Publikováno v:
In Journal of Materials Processing Tech. November 2024 332
We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both. The proposed model uses an integrated auxiliary block for text-based training. This block combine
Externí odkaz:
http://arxiv.org/abs/2302.14036
Autor:
Kuznetsova, Maria S., Kolobkova, Elena V., Bataev, Matvey N., Berdnikov, Vladimir S., Pankin, Dmitrii V., Smirnov, Mikhail B., Ubyivovk, Evgenii V., Ignatiev, Ivan V.
Publikováno v:
Journal of Chemical Physics; 9/28/2024, Vol. 161 Issue 12, p1-11, 11p
Autor:
Lazurenko, D.V., Dovzhenko, G.D., Emurlaev, K.I., Shikalov, V.S., Alexandrova, N.S., Domarov, E.V., Ruktuev, A.A., Kuzmin, R.I., Bataev, I.A.
Publikováno v:
In Journal of Alloys and Compounds 25 November 2024 1006