Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Loem, Mengsay"'
Autor:
Fujii, Kazuki, Nakamura, Taishi, Loem, Mengsay, Iida, Hiroki, Ohi, Masanari, Hattori, Kakeru, Shota, Hirai, Mizuki, Sakae, Yokota, Rio, Okazaki, Naoaki
Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost. In this study, we constructed Swallow, an L
Externí odkaz:
http://arxiv.org/abs/2404.17790
Autor:
Okazaki, Naoaki, Hattori, Kakeru, Shota, Hirai, Iida, Hiroki, Ohi, Masanari, Fujii, Kazuki, Nakamura, Taishi, Loem, Mengsay, Yokota, Rio, Mizuki, Sakae
Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus
Externí odkaz:
http://arxiv.org/abs/2404.17733
Publikováno v:
ACL2024 (findings)
Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics. However, the likelihood, a measure of LLM's plausibility for a sentence, can vary due to superficial differences in sentences, such as wo
Externí odkaz:
http://arxiv.org/abs/2402.15987
Large Language Models (LLMs) can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances. While proactive discussions in the inference phase have been shown t
Externí odkaz:
http://arxiv.org/abs/2311.08107
Large-scale pre-trained language models such as GPT-3 have shown remarkable performance across various natural language processing tasks. However, applying prompt-based methods with GPT-3 for Grammatical Error Correction (GEC) tasks and their control
Externí odkaz:
http://arxiv.org/abs/2305.18156
Impressive performance of Transformer has been attributed to self-attention, where dependencies between entire input in a sequence are considered at every position. In this work, we reform the neural $n$-gram model, which focuses on only several surr
Externí odkaz:
http://arxiv.org/abs/2207.13354
Neural models trained with large amount of parallel data have achieved impressive performance in abstractive summarization tasks. However, large-scale parallel corpora are expensive and challenging to construct. In this work, we introduce a low-cost
Externí odkaz:
http://arxiv.org/abs/2201.05313