Výsledky vyhledávání - "Laptev, Aleksandr"

Report

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter

Autor: Andrusenko, Andrei, Laptev, Aleksandr, Bataev, Vladimir, Lavrukhin, Vitaly, Ginsburg, Boris

Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicatin

Externí odkaz: http://arxiv.org/abs/2406.07096

Zobrazit plný text záznamu

Report

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

Autor: Park, Tae Jin, Huang, He, Jukic, Ante, Dhawan, Kunal, Puvvada, Krishna C., Koluguri, Nithin, Karpov, Nikolay, Laptev, Aleksandr, Balam, Jagadeesh, Ginsburg, Boris

Publikováno v: CHiME-7 Workshop 2023

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored t

Externí odkaz: http://arxiv.org/abs/2310.12378

Zobrazit plný text záznamu

Report

Confidence-based Ensembles of End-to-End Speech Recognition Models

Autor: Gitman, Igor, Lavrukhin, Vitaly, Laptev, Aleksandr, Ginsburg, Boris

The number of end-to-end speech recognition models grows every year. These models are often adapted to new domains or languages resulting in a proliferation of expert systems that achieve great results on target data, while generally showing inferior

Externí odkaz: http://arxiv.org/abs/2306.15824

Zobrazit plný text záznamu

Report

Powerful and Extensible WFST Framework for RNN-Transducer Losses

Autor: Laptev, Aleksandr, Bataev, Vladimir, Gitman, Igor, Ginsburg, Boris

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug

Externí odkaz: http://arxiv.org/abs/2303.10384

Zobrazit plný text záznamu

Report

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition

Autor: Laptev, Aleksandr, Ginsburg, Boris

This paper presents a class of new fast non-trainable entropy-based confidence estimation methods for automatic speech recognition. We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per

Externí odkaz: http://arxiv.org/abs/2212.08703

Zobrazit plný text záznamu

Report

CTC Variations Through New WFST Topologies

Autor: Laptev, Aleksandr, Majumdar, Somshubra, Ginsburg, Boris

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in whi

Externí odkaz: http://arxiv.org/abs/2110.03098

Zobrazit plný text záznamu

Report

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Autor: Mitrofanov, Anton, Korenevskaya, Mariya, Podluzhny, Ivan, Khokhlov, Yuri, Laptev, Aleksandr, Andrusenko, Andrei, Ilin, Aleksei, Korenevsky, Maxim, Medennikov, Ivan, Romanenko, Aleksei

Neural network-based language models are commonly used in rescoring approaches to improve the quality of modern automatic speech recognition (ASR) systems. Most of the existing methods are computationally expensive since they use autoregressive langu

Externí odkaz: http://arxiv.org/abs/2104.02526

Zobrazit plný text záznamu

Report

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

Autor: Laptev, Aleksandr, Andrusenko, Andrei, Podluzhny, Ivan, Mitrofanov, Anton, Medennikov, Ivan, Matveev, Yuri

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recogniti

Externí odkaz: http://arxiv.org/abs/2103.07186

Zobrazit plný text záznamu

Report

Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset

Autor: Andrusenko, Andrei, Laptev, Aleksandr, Medennikov, Ivan

This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transd

Externí odkaz: http://arxiv.org/abs/2006.08274

Zobrazit plný text záznamu

Report

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Autor: Laptev, Aleksandr, Korostik, Roman, Svischev, Aleksey, Andrusenko, Andrei, Medennikov, Ivan, Rybin, Sergey

Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (t

Externí odkaz: http://arxiv.org/abs/2005.07157

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání