Zobrazeno 1 - 10
of 88
pro vyhledávání: '"Sasano, Ryohei"'
Prediction of the future citation counts of papers is increasingly important to find interesting papers among an ever-growing number of papers. Although a paper's main text is an important factor for citation count prediction, it is difficult to hand
Externí odkaz:
http://arxiv.org/abs/2410.04404
Autor:
Tsukagoshi, Hayato, Sasano, Ryohei
We report the development of Ruri, a series of Japanese general text embedding models. While the development of general-purpose text embedding models in English and multilingual contexts has been active in recent years, model development in Japanese
Externí odkaz:
http://arxiv.org/abs/2409.07737
Large language models (LLMs) are supposed to acquire unconscious human knowledge and feelings, such as social common sense and biases, by training models from large amounts of text. However, it is not clear how much the sentiments of specific social
Externí odkaz:
http://arxiv.org/abs/2408.04293
In recent years, neural machine translation (NMT) has been widely used in everyday life. However, the current NMT lacks a mechanism to adjust the difficulty level of translations to match the user's language level. Additionally, due to the bias in th
Externí odkaz:
http://arxiv.org/abs/2408.04217
Autor:
Ishizuki, Yukiko, Kuribayashi, Tatsuki, Matsubayashi, Yuichiroh, Sasano, Ryohei, Inui, Kentaro
Speakers sometimes omit certain arguments of a predicate in a sentence; such omission is especially frequent in pro-drop languages. This study addresses a question about ellipsis -- what can explain the native speakers' ellipsis decisions? -- motivat
Externí odkaz:
http://arxiv.org/abs/2404.11315
Autor:
Tsukagoshi, Hayato, Hirao, Tsutomu, Morishita, Makoto, Chousa, Katsuki, Sasano, Ryohei, Takeda, Koichi
The task of Split and Rephrase, which splits a complex sentence into multiple simple sentences with the same meaning, improves readability and enhances the performance of downstream tasks in natural language processing (NLP). However, while Split and
Externí odkaz:
http://arxiv.org/abs/2404.09002
There are several linguistic claims about situations where words are more likely to be used as metaphors. However, few studies have sought to verify such claims with large corpora. This study entails a large-scale, corpus-based analysis of certain ex
Externí odkaz:
http://arxiv.org/abs/2404.01029
Decoder-based large language models (LLMs) have shown high performance on many tasks in natural language processing. This is also true for sentence embedding learning, where a decoder-based model, PromptEOL, has achieved the best performance on seman
Externí odkaz:
http://arxiv.org/abs/2402.15132
We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted exten
Externí odkaz:
http://arxiv.org/abs/2310.19349
It has been known to be difficult to generate adequate sports updates from a sequence of vast amounts of diverse live tweets, although the live sports viewing experience with tweets is gaining the popularity. In this paper, we focus on soccer matches
Externí odkaz:
http://arxiv.org/abs/2310.16368