Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Chousa, Katsuki"'
Using crowdsourcing, we collected more than 10,000 URL pairs (parallel top page pairs) of bilingual websites that contain parallel documents and created a Japanese-Chinese parallel corpus of 4.6M sentence pairs from these websites. We used a Japanese
Externí odkaz:
http://arxiv.org/abs/2405.09017
Autor:
Tsukagoshi, Hayato, Hirao, Tsutomu, Morishita, Makoto, Chousa, Katsuki, Sasano, Ryohei, Takeda, Koichi
The task of Split and Rephrase, which splits a complex sentence into multiple simple sentences with the same meaning, improves readability and enhances the performance of downstream tasks in natural language processing (NLP). However, while Split and
Externí odkaz:
http://arxiv.org/abs/2404.09002
Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora. Although there are billions of parallel sentences for a few language pairs, e
Externí odkaz:
http://arxiv.org/abs/2202.12607
Autor:
Chousa, Katsuki, Morishita, Makoto
This paper describes our systems that were submitted to the restricted translation task at WAT 2021. In this task, the systems are required to output translated sentences that contain all given word constraints. Our system combined input augmentation
Externí odkaz:
http://arxiv.org/abs/2106.05450
In this paper, we propose a method to extract bilingual texts automatically from noisy parallel corpora by framing the problem as a token-level span prediction, such as SQuAD-style Reading Comprehension. To extract a span of the target document that
Externí odkaz:
http://arxiv.org/abs/2004.14517
Simultaneous machine translation is a variant of machine translation that starts the translation process before the end of an input. This task faces a trade-off between translation accuracy and latency. We have to determine when we start the translat
Externí odkaz:
http://arxiv.org/abs/1911.11933
In neural machine translation (NMT), the computational cost at the output layer increases with the size of the target-side vocabulary. Using a limited-size vocabulary instead may cause a significant decrease in translation quality. This trade-off is
Externí odkaz:
http://arxiv.org/abs/1807.11219