Zobrazeno 1 - 10
of 96
pro vyhledávání: '"Nagata Masaaki"'
Autor:
Kocmi, Tom, Avramidis, Eleftherios, Bawden, Rachel, Bojar, Ondrej, Dvorkovich, Anton, Federmann, Christian, Fishel, Mark, Freitag, Markus, Gowda, Thamme, Grundkiewicz, Roman, Haddow, Barry, Karpinska, Marzena, Koehn, Philipp, Marie, Benjamin, Murray, Kenton, Nagata, Masaaki, Popel, Martin, Popovic, Maja, Shmatova, Mariya, Steingrímsson, Steinþór, Zouhar, Vilém
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any
Externí odkaz:
http://arxiv.org/abs/2407.19884
In this paper, we propose a two-phase training approach where pre-trained large language models are continually pre-trained on parallel data and then supervised fine-tuned with a small amount of high-quality parallel data. To investigate the effectiv
Externí odkaz:
http://arxiv.org/abs/2407.03145
The problem of hallucination and omission, a long-standing problem in machine translation (MT), is more pronounced when a large language model (LLM) is used in MT because an LLM itself is susceptible to these phenomena. In this work, we mitigate the
Externí odkaz:
http://arxiv.org/abs/2405.09223
Using crowdsourcing, we collected more than 10,000 URL pairs (parallel top page pairs) of bilingual websites that contain parallel documents and created a Japanese-Chinese parallel corpus of 4.6M sentence pairs from these websites. We used a Japanese
Externí odkaz:
http://arxiv.org/abs/2405.09017
Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we broaden the source of supervision by relaxing the requirement for correct,
Externí odkaz:
http://arxiv.org/abs/2306.05644
Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the ta
Externí odkaz:
http://arxiv.org/abs/2210.15861
To promote and further develop RST-style discourse parsing models, we need a strong baseline that can be regarded as a reference for reporting reliable experimental results. This paper explores a strong baseline by integrating existing simple parsing
Externí odkaz:
http://arxiv.org/abs/2210.08355
We define a novel concept called extended word alignment in order to improve post-editing assistance efficiency. Based on extended word alignment, we further propose a novel task called refined word-level QE that outputs refined tags and word-level c
Externí odkaz:
http://arxiv.org/abs/2209.11378
Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora. Although there are billions of parallel sentences for a few language pairs, e
Externí odkaz:
http://arxiv.org/abs/2202.12607
We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence.
Externí odkaz:
http://arxiv.org/abs/2004.14516