Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Autor: | Saito, Koshiro, Mizuki, Sakae, Ohi, Masanari, Nakamura, Taishi, Shiotani, Taihei, Maeda, Koki, Ma, Youmi, Hattori, Kakeru, Fujii, Kazuki, Okamoto, Takumi, Ishida, Shigeki, Takamura, Hiroya, Yokota, Rio, Okazaki, Naoaki |
---|---|
Rok vydání: | 2024 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Why do we build local large language models (LLMs)? What should a local LLM learn from the target language? Which abilities can be transferred from other languages? Do language-specific scaling laws exist? To explore these research questions, we evaluated 35 Japanese, English, and multilingual LLMs on 19 evaluation benchmarks for Japanese and English, taking Japanese as a local language. Adopting an observational approach, we analyzed correlations of benchmark scores, and conducted principal component analysis (PCA) on the scores to derive \textit{ability factors} of local LLMs. We found that training on English text can improve the scores of academic subjects in Japanese (JMMLU). In addition, it is unnecessary to specifically train on Japanese text to enhance abilities for solving Japanese code generation, arithmetic reasoning, commonsense, and reading comprehension tasks. In contrast, training on Japanese text could improve question-answering tasks about Japanese knowledge and English-Japanese translation, which indicates that abilities for solving these two tasks can be regarded as \textit{Japanese abilities} for LLMs. Furthermore, we confirmed that the Japanese abilities scale with the computational budget for Japanese text. Comment: Preprint. Under review |
Databáze: | arXiv |
Externí odkaz: |