Zobrazeno 1 - 10
of 31
pro vyhledávání: '"Ĉernocký, Jan ''Honza''"'
Autor:
Yusuf, Bolaji, Černocký, Jan "Honza", Saraçlar, Murat
End-to-end (E2E) keyword search (KWS) has emerged as an alternative and complimentary approach to conventional keyword search which depends on the output of automatic speech recognition (ASR) systems. While E2E methods greatly simplify the KWS pipeli
Externí odkaz:
http://arxiv.org/abs/2407.04652
Paralinguistic traits like cognitive load and emotion are increasingly recognized as pivotal areas in speech recognition research, often examined through specialized datasets like CLSE and IEMOCAP. However, the integrity of these datasets is seldom s
Externí odkaz:
http://arxiv.org/abs/2403.07767
Autor:
Baskar, Murali Karthick, Herzig, Tim, Nguyen, Diana, Diez, Mireia, Polzehl, Tim, Burget, Lukáš, Černocký, Jan "Honza''
Dysarthric speech recognition has posed major challenges due to lack of training data and heavy mismatch in speaker characteristics. Recent ASR systems have benefited from readily available pretrained models such as wav2vec2 to improve the recognitio
Externí odkaz:
http://arxiv.org/abs/2204.00770
Autor:
Stafylakis, Themos, Mošner, Ladislav, Plchot, Oldřich, Rohdin, Johan, Silnova, Anna, Burget, Lukáš, Černocký, Jan "Honza''
In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge of the time
Externí odkaz:
http://arxiv.org/abs/2203.15436
Autor:
Baskar, Murali Karthick, Burget, Lukáš, Watanabe, Shinji, Astudillo, Ramon Fernandez, Černocký, Jan "Honza''
Self-supervised ASR-TTS models suffer in out-of-domain data conditions. Here we propose an enhanced ASR-TTS (EAT) model that incorporates two main features: 1) The ASR$\rightarrow$TTS direction is equipped with a language model reward to penalize the
Externí odkaz:
http://arxiv.org/abs/2104.07474
Autor:
Szoke, Igor, Kesiraju, Santosh, Novotny, Ondrej, Kocour, Martin, Vesely, Karel, Cernocky, Jan "Honza"
We launched a community platform for collecting the ATC speech world-wide in the ATCO2 project. Filtering out unseen non-English speech is one of the main components in the data processing pipeline. The proposed English Language Detection (ELD) syste
Externí odkaz:
http://arxiv.org/abs/2104.02332
Autor:
Kocour, Martin, Cámbara, Guillermo, Luque, Jordi, Bonet, David, Farrús, Mireia, Karafiát, Martin, Veselý, Karel, Ĉernocký, Jan ''Honza''
This paper describes joint effort of BUT and Telef\'onica Research on development of Automatic Speech Recognition systems for Albayzin 2020 Challenge. We compare approaches based on either hybrid or end-to-end models. In hybrid modelling, we explore
Externí odkaz:
http://arxiv.org/abs/2101.12729
Autor:
Zmolikova, Katerina, Delcroix, Marc, Burget, Lukáš, Nakatani, Tomohiro, Černocký, Jan "Honza"
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several wo
Externí odkaz:
http://arxiv.org/abs/2011.11984
Autor:
Karafiát, Martin, Baskar, Murali Karthick, Szöke, Igor, Vydana, Hari Krishna, Veselý, Karel, Černocký, Jan "Honza''
The paper describes the BUT Automatic Speech Recognition (ASR) systems submitted for OpenSAT evaluations under two domain categories such as low resourced languages and public safety communications. The first was challenging due to lack of training d
Externí odkaz:
http://arxiv.org/abs/2001.11360
DeepMine is a speech database in Persian and English designed to build and evaluate text-dependent, text-prompted, and text-independent speaker verification, as well as Persian speech recognition systems. It contains more than 1850 speakers and 540 t
Externí odkaz:
http://arxiv.org/abs/1912.03627