Výsledky vyhledávání - "Novotny, Vit"

Report

People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts

Autor: Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, Horák, Aleš

Although pre-trained named entity recognition (NER) models are highly accurate on modern corpora, they underperform on historical texts due to differences in language OCR errors. In this work, we develop a new NER corpus of 3.6M sentences from late m

Externí odkaz: http://arxiv.org/abs/2305.16718

Zobrazit plný text záznamu

Report

Adaptor: Objective-Centric Adaptation Framework for Language Models

Autor: Štefánik, Michal, Novotný, Vít, Groverová, Nikola, Sojka, Petr

Progress in natural language processing research is catalyzed by the possibilities given by the widespread software frameworks. This paper introduces Adaptor library that transposes the traditional model-centric approach composed of pre-training + fi

Externí odkaz: http://arxiv.org/abs/2203.03989

Zobrazit plný text záznamu

Akademický článek

Chromatography-free synthesis of 2A,2B-disulfonated β-cyclodextrin for regiospecific di-substitution

Autor: Hobbs, Christopher J., Novotný, Vít, Řezanka, Michal

Publikováno v: In Carbohydrate Polymers 15 January 2025 348 Part B

Zobrazit plný text záznamu

Report

Regressive Ensemble for Machine Translation Quality Evaluation

Autor: Štefánik, Michal, Novotný, Vít, Sojka, Petr

This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics. We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop. In

Externí odkaz: http://arxiv.org/abs/2109.07242

Zobrazit plný text záznamu

Report

WebMIaS on Docker: Deploying Math-Aware Search in a Single Line of Code

Autor: Lupták, Dávid, Novotný, Vít, Štefánik, Michal, Sojka, Petr

Math informational retrieval (MIR) search engines are absent in the wide-spread production use, even though documents in the STEM fields contain many mathematical formulae, which are sometimes more important than text for understanding. We have devel

Externí odkaz: http://arxiv.org/abs/2106.00411

Zobrazit plný text záznamu

Report

When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Autor: Novotný, Vít, Štefánik, Michal, Ayetiran, Eniafe Festus, Sojka, Petr, Řehůřek, Radim

Publikováno v: J. Univers. Comput. Sci. 28:2 (2022) 181-201

In 2018, Mikolov et al. introduced the positional language model, which has characteristics of attention-based neural machine translation models and which achieved state-of-the-art performance on the intrinsic word analogy task. However, the position

Externí odkaz: http://arxiv.org/abs/2104.09691

Zobrazit plný text záznamu

Report

EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses

Autor: Ayetiran, Eniafe Festus, Sojka, Petr, Novotný, Vít

Publikováno v: Knowledge-Based Systems. 219 (2021) 106902

Several language applications often require word semantics as a core part of their processing pipeline, either as precise meaning inference or semantic similarity. Multi-sense embeddings (M-SE) can be exploited for this important requirement. M-SE se

Externí odkaz: http://arxiv.org/abs/2103.00232

Zobrazit plný text záznamu

Report

One Size Does Not Fit All: Finding the Optimal Subword Sizes for FastText Models across Languages

Autor: Novotný, Vít, Ayetiran, Eniafe Festus, Bačovský, Dalibor, Lupták, Dávid, Štefánik, Michal, Sojka, Petr

Publikováno v: RANLP (2021) 1072-1078

Unsupervised representation learning of words from large multilingual corpora is useful for downstream tasks such as word sense disambiguation, semantic text similarity, and information retrieval. The representation precision of log-bilinear fastText

Externí odkaz: http://arxiv.org/abs/2102.02585

Zobrazit plný text záznamu

Report

Text classification with word embedding regularization and soft similarity measure

Autor: Novotný, Vít, Ayetiran, Eniafe Festus, Štefánik, Michal, Sojka, Petr

Since the seminal work of Mikolov et al., word embeddings have become the preferred word representations for many natural language processing tasks. Document similarity measures extracted from word embeddings, such as the soft cosine measure (SCM) an

Externí odkaz: http://arxiv.org/abs/2003.05019

Zobrazit plný text záznamu

Report

Implementation Notes for the Soft Cosine Measure

Autor: Novotný, Vít

The standard bag-of-words vector space model (VSM) is efficient, and ubiquitous in information retrieval, but it underestimates the similarity of documents with the same meaning, but different terminology. To overcome this limitation, Sidorov et al.

Externí odkaz: http://arxiv.org/abs/1808.09407

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání