Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Acs, Judit"'
Training summarization models requires substantial amounts of training data. However for less resourceful languages like Hungarian, openly available models and datasets are notably scarce. To address this gap our paper introduces HunSum-2 an open-sou
Externí odkaz:
http://arxiv.org/abs/2404.03555
Data augmentation methods for neural machine translation are particularly useful when limited amount of training data is available, which is often the case when dealing with low-resource languages. We introduce a novel augmentation method, which gene
Externí odkaz:
http://arxiv.org/abs/2311.02355
We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across
Externí odkaz:
http://arxiv.org/abs/2307.07025
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label
Externí odkaz:
http://arxiv.org/abs/2306.06205
We introduce HunSum-1: a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset
Externí odkaz:
http://arxiv.org/abs/2302.00455
We train Transformer-based neural machine translation models for Hungarian-English and English-Hungarian using the Hunglish2 corpus. Our best models achieve a BLEU score of 40.0 on HungarianEnglish and 33.4 on English-Hungarian. Furthermore, we prese
Externí odkaz:
http://arxiv.org/abs/2201.06876
We present the BME submission for the SIGMORPHON 2021 Task 0 Part 1, Generalization Across Typologically Diverse Languages shared task. We use an LSTM encoder-decoder model with three step training that is first trained on all languages, then fine-tu
Externí odkaz:
http://arxiv.org/abs/2109.07006
Transformer-based language models such as BERT have outperformed previous models on a large number of English benchmarks, but their evaluation is often limited to English or a small number of well-resourced languages. In this work, we evaluate monoli
Externí odkaz:
http://arxiv.org/abs/2109.06327
Publikováno v:
EACL2021
Contextual word-representations became a standard in modern natural language processing systems. These models use subword tokenization to handle large vocabularies and unknown words. Word-level usage of such systems requires a way of pooling multiple
Externí odkaz:
http://arxiv.org/abs/2102.10864
Publikováno v:
Hungarian NLP Conference (MSZNY2021)
We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological p
Externí odkaz:
http://arxiv.org/abs/2102.10848