Výsledky vyhledávání - "Nagata Masaaki"

Report

Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data

Autor: Kondo, Minato, Utsuro, Takehito, Nagata, Masaaki

In this paper, we propose a two-phase training approach where pre-trained large language models are continually pre-trained on parallel data and then supervised fine-tuned with a small amount of high-quality parallel data. To investigate the effectiv

Externí odkaz: http://arxiv.org/abs/2407.03145

Zobrazit plný text záznamu

Report

Word Alignment as Preference for Machine Translation

Autor: Wu, Qiyu, Nagata, Masaaki, Miao, Zhongtao, Tsuruoka, Yoshimasa

The problem of hallucination and omission, a long-standing problem in machine translation (MT), is more pronounced when a large language model (LLM) is used in MT because an LLM itself is susceptible to these phenomena. In this work, we mitigate the

Externí odkaz: http://arxiv.org/abs/2405.09223

Zobrazit plný text záznamu

Report

A Japanese-Chinese Parallel Corpus Using Crowdsourcing for Web Mining

Autor: Nagata, Masaaki, Morishita, Makoto, Chousa, Katsuki, Yasuda, Norihito

Using crowdsourcing, we collected more than 10,000 URL pairs (parallel top page pairs) of bilingual websites that contain parallel documents and created a Japanese-Chinese parallel corpus of 4.6M sentence pairs from these websites. We used a Japanese

Externí odkaz: http://arxiv.org/abs/2405.09017

Zobrazit plný text záznamu

Report

WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction

Autor: Wu, Qiyu, Nagata, Masaaki, Tsuruoka, Yoshimasa

Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we broaden the source of supervision by relaxing the requirement for correct,

Externí odkaz: http://arxiv.org/abs/2306.05644

Zobrazit plný text záznamu

Report

Domain Adaptation of Machine Translation with Crowdworkers

Autor: Morishita, Makoto, Suzuki, Jun, Nagata, Masaaki

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the ta

Externí odkaz: http://arxiv.org/abs/2210.15861

Zobrazit plný text záznamu

Report

A Simple and Strong Baseline for End-to-End Neural RST-style Discourse Parsing

Autor: Kobayashi, Naoki, Hirao, Tsutomu, Kamigaito, Hidetaka, Okumura, Manabu, Nagata, Masaaki

To promote and further develop RST-style discourse parsing models, we need a strong baseline that can be regarded as a reference for reporting reliable experimental results. This paper explores a strong baseline by integrating existing simple parsing

Externí odkaz: http://arxiv.org/abs/2210.08355

Zobrazit plný text záznamu

Report

Extending Word-Level Quality Estimation for Post-Editing Assistance

Autor: Wei, Yizhen, Utsuro, Takehito, Nagata, Masaaki

We define a novel concept called extended word alignment in order to improve post-editing assistance efficiency. Based on extended word alignment, we further propose a novel task called refined word-level QE that outputs refined tags and word-level c

Externí odkaz: http://arxiv.org/abs/2209.11378

Zobrazit plný text záznamu

Report

JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus

Autor: Morishita, Makoto, Chousa, Katsuki, Suzuki, Jun, Nagata, Masaaki

Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora. Although there are billions of parallel sentences for a few language pairs, e

Externí odkaz: http://arxiv.org/abs/2202.12607

Zobrazit plný text záznamu

Report

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

Autor: Nagata, Masaaki, Katsuki, Chousa, Nishino, Masaaki

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence.

Externí odkaz: http://arxiv.org/abs/2004.14516

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání