Zobrazeno 1 - 10
of 1 167
pro vyhledávání: '"NAKAYAMA, HIDEKI"'
Multimodal Large Language Models (MLLMs) have made notable advances in visual understanding, yet their abilities to recognize objects modified by specific attributes remain an open question. To address this, we explore MLLMs' reasoning capabilities i
Externí odkaz:
http://arxiv.org/abs/2411.17794
Recently, leveraging big data in deep learning has led to significant performance improvements, as confirmed in applications like mental state decoding using fMRI data. However, fMRI datasets remain relatively small in scale, and the inherent issue o
Externí odkaz:
http://arxiv.org/abs/2410.04383
Recently, Text-to-speech (TTS) models based on large language models (LLMs) that translate natural language text into sequences of discrete audio tokens have gained great research attention, with advances in neural audio codec (NAC) models using resi
Externí odkaz:
http://arxiv.org/abs/2410.04380
Diffusion models have recently shown the ability to generate high-quality images. However, controlling its generation process still poses challenges. The image style transfer task is one of those challenges that transfers the visual attributes of a s
Externí odkaz:
http://arxiv.org/abs/2410.01366
Autor:
Nishida, Noriki, Nakayama, Hideki
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 8, Pp 215-230 (2020)
In this paper, we introduce an unsupervised discourse constituency parsing algorithm. We use Viterbi EM with a margin-based criterion to train a span-based discourse parser in an unsupervised manner. We also propose initialization methods for Viterbi
Externí odkaz:
https://doaj.org/article/ca03797390e64f49885d1b8c10f56f32
This paper investigates the quality of multi-agent dialogues in simulations powered by Large Language Models (LLMs). Analyzing dialogues and memory over multiple sessions revealed significant issues such as repetition, inconsistency, and hallucinatio
Externí odkaz:
http://arxiv.org/abs/2407.09897
This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG). Due to the lack of effective supervision, the current SGG model predictions perform poorly for informative relational triplets with
Externí odkaz:
http://arxiv.org/abs/2406.19316
This research investigates prompt designs of evaluating generated texts using large language models (LLMs). While LLMs are increasingly used for scoring various inputs, creating effective prompts for open-ended text evaluation remains challenging due
Externí odkaz:
http://arxiv.org/abs/2406.09972
This research investigates the effect of prompt design on dialogue evaluation using large language models (LLMs). While LLMs are increasingly used for scoring various inputs, creating effective prompts for dialogue evaluation remains challenging due
Externí odkaz:
http://arxiv.org/abs/2406.02863
Enhancing user engagement through personalization in conversational agents has gained significance, especially with the advent of large language models that generate fluent responses. Personalized dialogue generation, however, is multifaceted and var
Externí odkaz:
http://arxiv.org/abs/2405.17974