Zobrazeno 1 - 10
of 54
pro vyhledávání: '"Naplava, P."'
Autor:
Vonášek, Josef, Straka, Milan, Krč, Rostislav, Lasoňová, Lenka, Egorova, Ekaterina, Straková, Jana, Náplava, Jakub
We present CWRCzech, Click Web Ranking dataset for Czech, a 100M query-document Czech click dataset for relevance ranking with user behavior data collected from search engine logs of Seznam$.$cz. To the best of our knowledge, CWRCzech is the largest
Externí odkaz:
http://arxiv.org/abs/2405.20994
This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given the limited availability of l
Externí odkaz:
http://arxiv.org/abs/2311.13921
We introduce a large and diverse Czech corpus annotated for grammatical error correction (GEC) with the aim to contribute to the still scarce data resources in this domain for languages other than English. The Grammar Error Correction Corpus for Czec
Externí odkaz:
http://arxiv.org/abs/2201.05590
Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-
Externí odkaz:
http://arxiv.org/abs/2112.01810
We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations. Recently, per-word classification of correction edits has proven an efficient, parallelizable alternative to current encoder-decoder
Externí odkaz:
http://arxiv.org/abs/2111.09280
Sensitivity of deep-neural models to input noise is known to be a challenging problem. In NLP, model performance often deteriorates with naturally occurring noise, such as spelling errors. To mitigate this issue, models may leverage artificially nois
Externí odkaz:
http://arxiv.org/abs/2110.07428
Publikováno v:
The Prague Bulletin of Mathematical Linguistics No. 116, 2021, pp. 27-42
We propose a new architecture for diacritics restoration based on contextualized embeddings, namely BERT, and we evaluate it on 12 languages with diacritics. Furthermore, we conduct a detailed error analysis on Czech, a morphologically rich language
Externí odkaz:
http://arxiv.org/abs/2105.11408
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Cz
Externí odkaz:
http://arxiv.org/abs/2105.11314
Autor:
Náplava, Jakub, Straka, Milan
Grammatical error correction in English is a long studied problem with many existing systems and datasets. However, there has been only a limited research on error correction of other languages. In this paper, we present a new dataset AKCES-GEC on gr
Externí odkaz:
http://arxiv.org/abs/1910.00353
CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction
Autor:
Náplava, Jakub, Straka, Milan
In this paper, we describe our systems submitted to the Building Educational Applications (BEA) 2019 Shared Task (Bryant et al., 2019). We participated in all three tracks. Our models are NMT systems based on the Transformer model, which we improve b
Externí odkaz:
http://arxiv.org/abs/1909.05553