Autor: |
Ennen, Philipp, Hsu, Po-Chun, Hsu, Chan-Jan, Liu, Chang-Le, Wu, Yen-Chen, Liao, Yin-Hsiang, Lin, Chin-Tung, Shiu, Da-Shan, Ma, Wei-Yun |
Rok vydání: |
2023 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
In this paper we present the multilingual language model BLOOM-zh that features enhanced support for Traditional Chinese. BLOOM-zh has its origins in the open-source BLOOM models presented by BigScience in 2022. Starting from released models, we extended the pre-training of BLOOM by additional 7.4 billion tokens in Traditional Chinese and English covering a variety of domains such as news articles, books, encyclopedias, educational materials as well as spoken language. In order to show the properties of BLOOM-zh, both existing and newly created benchmark scenarios are used for evaluating the performance. BLOOM-zh outperforms its predecessor on most Traditional Chinese benchmarks while maintaining its English capability. We release all our models to the research community. |
Databáze: |
arXiv |
Externí odkaz: |
|