Výsledky vyhledávání

Report

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

Autor: Nakata, Wataru, Seki, Kentaro, Yanaka, Hitomi, Saito, Yuki, Takamichi, Shinnosuke, Saruwatari, Hiroshi

Spoken dialogue plays a crucial role in human-AI interactions, necessitating dialogue-oriented spoken language models (SLMs). To develop versatile SLMs, large-scale and diverse speech datasets are essential. Additionally, to ensure hiqh-quality speec

Externí odkaz: http://arxiv.org/abs/2407.15828

Zobrazit plný text záznamu

Report

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Autor: LLM-jp, Aizawa, Akiko, Aramaki, Eiji, Chen, Bowen, Cheng, Fei, Deguchi, Hiroyuki, Enomoto, Rintaro, Fujii, Kazuki, Fukumoto, Kensuke, Fukushima, Takuya, Han, Namgi, Harada, Yuto, Hashimoto, Chikara, Hiraoka, Tatsuya, Hisada, Shohei, Hosokawa, Sosuke, Jie, Lu, Kamata, Keisuke, Kanazawa, Teruhito, Kanezashi, Hiroki, Kataoka, Hiroshi, Katsumata, Satoru, Kawahara, Daisuke, Kawano, Seiya, Keyaki, Atsushi, Kiryu, Keisuke, Kiyomaru, Hirokazu, Kodama, Takashi, Kubo, Takahiro, Kuga, Yohei, Kumon, Ryoma, Kurita, Shuhei, Kurohashi, Sadao, Li, Conglong, Maekawa, Taiki, Matsuda, Hiroshi, Miyao, Yusuke, Mizuki, Kentaro, Mizuki, Sakae, Murawaki, Yugo, Nakamura, Ryo, Nakamura, Taishi, Nakayama, Kouta, Nakazato, Tomoka, Niitsuma, Takuro, Nishitoba, Jiro, Oda, Yusuke, Ogawa, Hayato, Okamoto, Takumi, Okazaki, Naoaki, Oseki, Yohei, Ozaki, Shintaro, Ryu, Koki, Rzepka, Rafal, Sakaguchi, Keisuke, Sasaki, Shota, Sekine, Satoshi, Suda, Kohei, Sugawara, Saku, Sugiura, Issa, Sugiyama, Hiroaki, Suzuki, Hisami, Suzuki, Jun, Suzumura, Toyotaro, Tachibana, Kensuke, Takagi, Yu, Takami, Kyosuke, Takeda, Koichi, Takeshita, Masashi, Tanaka, Masahiro, Taura, Kenjiro, Tolmachev, Arseny, Ueda, Nobuhiro, Wan, Zhen, Yada, Shuntaro, Yahata, Sakiko, Yamamoto, Yuya, Yamauchi, Yusuke, Yanaka, Hitomi, Yokota, Rio, Yoshino, Koichiro

This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants

Externí odkaz: http://arxiv.org/abs/2407.03963

Zobrazit plný text záznamu

Report

Evaluating Structural Generalization in Neural Machine Translation

Autor: Kumon, Ryoma, Matsuoka, Daiki, Yanaka, Hitomi

Compositional generalization refers to the ability to generalize to novel combinations of previously observed words and syntactic structures. Since it is regarded as a desired property of neural models, recent work has assessed compositional generali

Externí odkaz: http://arxiv.org/abs/2406.13363

Zobrazit plný text záznamu

Report

Exploring Intra and Inter-language Consistency in Embeddings with ICA

Autor: Li, Rongzhi, Matsuda, Takeru, Yanaka, Hitomi

Word embeddings represent words as multidimensional real vectors, facilitating data analysis and processing, but are often challenging to interpret. Independent Component Analysis (ICA) creates clearer semantic axes by identifying independent key fea

Externí odkaz: http://arxiv.org/abs/2406.12474

Zobrazit plný text záznamu

Report

Analyzing Social Biases in Japanese Large Language Models

Autor: Yanaka, Hitomi, Han, Namgi, Kumon, Ryoma, Lu, Jie, Takeshita, Masashi, Sekizawa, Ryo, Kato, Taisei, Arai, Hiromi

With the development of Large Language Models (LLMs), social biases in the LLMs have become a crucial issue. While various benchmarks for social biases have been provided across languages, the extent to which Japanese LLMs exhibit social biases has n

Externí odkaz: http://arxiv.org/abs/2406.02050

Zobrazit plný text záznamu

Report

Comprehensive Evaluation of Large Language Models for Topic Modeling

Autor: Doi, Tomoki, Isonuma, Masaru, Yanaka, Hitomi

Recent work utilizes Large Language Models (LLMs) for topic modeling, generating comprehensible topic labels for given documents. However, their performance has mainly been evaluated qualitatively, and there remains room for quantitative investigatio

Externí odkaz: http://arxiv.org/abs/2406.00697

Zobrazit plný text záznamu

Report

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Autor: Kojima, Takeshi, Okimura, Itsuki, Iwasawa, Yusuke, Yanaka, Hitomi, Matsuo, Yutaka

Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based P

Externí odkaz: http://arxiv.org/abs/2404.02431

Zobrazit plný text záznamu

Akademický článek

Isolated choroid plexus infarction caused by multiple occlusive cerebrovascular lesions

Autor: Michihide Kajita, MD, Kiyoyuki Yanaka, MD, PhD, Hayato Takeda, MD, Minami Saura, MD, Toshihide Takahashi, MD, PhD, Hitoshi Aiyama, MD, PhD, Shinji Saiki, MD, PhD, Eiichi Ishikawa, MD, PhD

Publikováno v: Radiology Case Reports, Vol 19, Iss 12, Pp 5633-5638 (2024)

The choroid plexus is the secretory tissue responsible for cerebrospinal fluid production in the brain. Ischemia of the choroid plexus is rare because of its abundant blood supply from multiple arterial systems, including the anterior and posterior c

Externí odkaz: https://doaj.org/article/5c6b41916f414eeb8ba0151174da8ca5

Zobrazit plný text záznamu

Report

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Autor: Sekizawa, Ryo, Duan, Nan, Lu, Shuai, Yanaka, Hitomi

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only in Englis

Externí odkaz: http://arxiv.org/abs/2306.15604

Zobrazit plný text záznamu

Report

Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language Models

Autor: Sugimoto, Tomoki, Onoe, Yasumasa, Yanaka, Hitomi

Natural Language Inference (NLI) tasks involving temporal inference remain challenging for pre-trained language models (LMs). Although various datasets have been created for this task, they primarily focus on English and do not address the need for r

Externí odkaz: http://arxiv.org/abs/2306.10727

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání