Zobrazeno 1 - 10
of 40
pro vyhledávání: '"Psutka, Josef V."'
Publikováno v:
Proceedings of Interspeech 2024
In this paper, we are comparing monolingual Wav2Vec 2.0 models with various multilingual models to see whether we could improve speech recognition performance on a unique oral history archive containing a lot of mixed-language sentences. Our main goa
Externí odkaz:
http://arxiv.org/abs/2407.17160
Publikováno v:
Text, Speech, and Dialogue: 26th International Conference, TSD 2023
In this paper, we are comparing several methods of training the Slovak speech recognition models based on the Transformers architecture. Specifically, we are exploring the approach of transfer learning from the existing Czech pre-trained Wav2Vec 2.0
Externí odkaz:
http://arxiv.org/abs/2306.04399
Publikováno v:
\v{S}vec, J., \v{S}m\'idl, L., Psutka, J.V., Pra\v{z}\'ak, A. (2021) Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings. Proc. Interspeech 2021, 4398-4402
The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a
Externí odkaz:
http://arxiv.org/abs/2210.11895
Publikováno v:
TSD 2022. Lecture Notes in Computer Science, vol 13502. Springer, Cham
Czech is a very specific language due to its large differences between the formal and the colloquial form of speech. While the formal (written) form is used mainly in official documents, literature, and public speeches, the colloquial (spoken) form i
Externí odkaz:
http://arxiv.org/abs/2206.07666
Publikováno v:
Interspeech 2022, 1831-1835
In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks u
Externí odkaz:
http://arxiv.org/abs/2206.07627
Autor:
Psutka, Josef V., Psutka, Josef
Publikováno v:
In Pattern Recognition July 2019 91:25-33
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Publikováno v:
Multimedia Tools & Applications; Jan2020, Vol. 79 Issue 1/2, p1203-1220, 18p
Článek popisuje proces a některé zajímavé poznatky získané během titulkování více než 70 hodin živého televizního vysílání z olympijských her v Soči. Skryté titulky byla vytvářeny pro kanál ČT Sport což je sportovní kanál
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od______8936::d423f24ba02657758207922abdea6aa0
http://hdl.handle.net/11025/17180
http://hdl.handle.net/11025/17180
An estimation of parameters of a multivariate Gaussian Mixture Model is usually based on a criterion (e.g. Maximum Likelihood) that is focused mostly on training data. Therefore, testing data, which were not seen during the training procedure, may ca
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od______8936::89558b0da232388b1cec10af9e8cecf6
http://hdl.handle.net/11025/17161
http://hdl.handle.net/11025/17161