Zobrazeno 1 - 10
of 369
pro vyhledávání: '"Švec, Jan"'
The paper presents a method for spoken term detection based on the Transformer architecture. We propose the encoder-encoder architecture employing two BERT-like encoders with additional modifications, including convolutional and upsampling layers, at
Externí odkaz:
http://arxiv.org/abs/2211.01089
Publikováno v:
\v{S}vec, J., \v{S}m\'idl, L., Psutka, J.V., Pra\v{z}\'ak, A. (2021) Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings. Proc. Interspeech 2021, 4398-4402
The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a
Externí odkaz:
http://arxiv.org/abs/2210.11895
Publikováno v:
\v{S}vec, J., Lehe\v{c}ka, J., \v{S}m\'idl, L. (2022) Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer. Proc. Interspeech 2022, 1886-1890
In recent years, the standard hybrid DNN-HMM speech recognizers are outperformed by the end-to-end speech recognition systems. One of the very promising approaches is the grapheme Wav2Vec 2.0 model, which uses the self-supervised pretraining approach
Externí odkaz:
http://arxiv.org/abs/2210.11885
Autor:
Švec, Ján, Žmolíková, Kateřina, Kocour, Martin, Delcroix, Marc, Ochiai, Tsubasa, Mošner, Ladislav, Černocký, Jan
Recently, the performance of blind speech separation (BSS) and target speech extraction (TSE) has greatly progressed. Most works, however, focus on relatively well-controlled conditions using, e.g., read speech. The performance may degrade in more re
Externí odkaz:
http://arxiv.org/abs/2208.07091
Publikováno v:
Interspeech 2022, 1831-1835
In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks u
Externí odkaz:
http://arxiv.org/abs/2206.07627
Publikováno v:
In Journal of Voice September 2024 38(5):1035-1054
Autor:
Kocour, Martin, Žmolíková, Kateřina, Ondel, Lucas, Švec, Ján, Delcroix, Marc, Ochiai, Tsubasa, Burget, Lukáš, Černocký, Jan
In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separ
Externí odkaz:
http://arxiv.org/abs/2111.00009
Autor:
Lehečka, Jan, Švec, Jan
Publikováno v:
Statistical Language and Speech Processing, SLSP 2021. Cham: Springer, 2021. pages 27-37. ISBN: 978-3-030-89578-5 , ISSN: 0302-9743
In this paper, we present our progress in pre-training monolingual Transformers for Czech and contribute to the research community by releasing our models for public. The need for such models emerged from our effort to employ Transformers in our lang
Externí odkaz:
http://arxiv.org/abs/2107.10042
Publikováno v:
In IFAC PapersOnLine 2024 58(9):7-12