Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Medennikov, Ivan"'
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
Autor:
Wang, Jinhan, Wang, Weiqing, Dhawan, Kunal, Park, Taejin, Kim, Myungjong, Medennikov, Ivan, Huang, He, Koluguri, Nithin, Balam, Jagadeesh, Ginsburg, Boris
We propose a novel end-to-end multi-talker automatic speech recognition (ASR) framework that enables both multi-speaker (MS) ASR and target-speaker (TS) ASR. Our proposed model is trained in a fully end-to-end manner, incorporating speaker supervisio
Externí odkaz:
http://arxiv.org/abs/2409.12352
Autor:
Park, Taejin, Medennikov, Ivan, Dhawan, Kunal, Wang, Weiqing, Huang, He, Koluguri, Nithin Rao, Puvvada, Krishna C., Balam, Jagadeesh, Ginsburg, Boris
We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challe
Externí odkaz:
http://arxiv.org/abs/2409.06656
Autor:
Wang, Weiqing, Dhawan, Kunal, Park, Taejin, Puvvada, Krishna C., Medennikov, Ivan, Majumdar, Somshubra, Huang, He, Balam, Jagadeesh, Ginsburg, Boris
Speech foundation models have achieved state-of-the-art (SoTA) performance across various tasks, such as automatic speech recognition (ASR) in hundreds of languages. However, multi-speaker ASR remains a challenging task for these models due to data s
Externí odkaz:
http://arxiv.org/abs/2409.01438
Autor:
Huang, He, Park, Taejin, Dhawan, Kunal, Medennikov, Ivan, Puvvada, Krishna C., Koluguri, Nithin Rao, Wang, Weiqing, Balam, Jagadeesh, Ginsburg, Boris
Self-supervised learning has been proved to benefit a wide range of speech processing tasks, such as speech recognition/translation, speaker verification and diarization, etc. However, most of current approaches are computationally expensive. In this
Externí odkaz:
http://arxiv.org/abs/2408.13106
Autor:
Mitrofanov, Anton, Korenevskaya, Mariya, Podluzhny, Ivan, Khokhlov, Yuri, Laptev, Aleksandr, Andrusenko, Andrei, Ilin, Aleksei, Korenevsky, Maxim, Medennikov, Ivan, Romanenko, Aleksei
Neural network-based language models are commonly used in rescoring approaches to improve the quality of modern automatic speech recognition (ASR) systems. Most of the existing methods are computationally expensive since they use autoregressive langu
Externí odkaz:
http://arxiv.org/abs/2104.02526
Autor:
Laptev, Aleksandr, Andrusenko, Andrei, Podluzhny, Ivan, Mitrofanov, Anton, Medennikov, Ivan, Matveev, Yuri
With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recogniti
Externí odkaz:
http://arxiv.org/abs/2103.07186
This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transd
Externí odkaz:
http://arxiv.org/abs/2006.08274
Autor:
Laptev, Aleksandr, Korostik, Roman, Svischev, Aleksey, Andrusenko, Andrei, Medennikov, Ivan, Rybin, Sergey
Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (t
Externí odkaz:
http://arxiv.org/abs/2005.07157
Autor:
Medennikov, Ivan, Korenevsky, Maxim, Prisyach, Tatiana, Khokhlov, Yuri, Korenevskaya, Mariya, Sorokin, Ivan, Timofeeva, Tatiana, Mitrofanov, Anton, Andrusenko, Andrei, Podluzhny, Ivan, Laptev, Aleksandr, Romanenko, Aleksei
Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used clustering-based diarization approaches perform rather poorly in such conditions, mainly due to the limited ability to handle overlapping speech. We propose
Externí odkaz:
http://arxiv.org/abs/2005.07272
While end-to-end ASR systems have proven competitive with the conventional hybrid approach, they are prone to accuracy degradation when it comes to noisy and low-resource conditions. In this paper, we argue that, even in such difficult cases, some en
Externí odkaz:
http://arxiv.org/abs/2004.10799