Zobrazeno 1 - 10
of 175
pro vyhledávání: '"Fujita, Yasuhiro"'
In the post-training of large language models (LLMs), Reinforcement Learning from Human Feedback (RLHF) is an effective approach to achieve generation aligned with human preferences. Direct Preference Optimization (DPO) allows for policy training wit
Externí odkaz:
http://arxiv.org/abs/2411.07595
Autor:
Elements, Preferred, Abe, Kenshin, Chubachi, Kaizaburo, Fujita, Yasuhiro, Hirokawa, Yuta, Imajo, Kentaro, Kataoka, Toshiki, Komatsu, Hiroyoshi, Mikami, Hiroaki, Mogami, Tsuguo, Murai, Shogo, Nakago, Kosuke, Nishino, Daisuke, Ogawa, Toru, Okanohara, Daisuke, Ozaki, Yoshihiko, Sano, Shotaro, Suzuki, Shuji, Xu, Tianqi, Yanase, Toshihiko
We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the trainin
Externí odkaz:
http://arxiv.org/abs/2410.07563
We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. Our method defines the neural scene representation on the mesh surface points and signed distances from the surface of a human body me
Externí odkaz:
http://arxiv.org/abs/2201.01683
Autor:
Fujita, Yasuhiro, Uenishi, Kota, Ummadisingu, Avinash, Nagarajan, Prabhat, Masuda, Shimpei, Castro, Mario Ynocente
Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems. We take a step towards this broader goal by presenting the first RL-b
Externí odkaz:
http://arxiv.org/abs/2007.08082
Model-based reinforcement learning methods typically learn models for high-dimensional state spaces by aiming to reconstruct and predict the original observations. However, drawing inspiration from model-free reinforcement learning, we propose learni
Externí odkaz:
http://arxiv.org/abs/1912.04201
Publikováno v:
Journal of Machine Learning Research 22(77) (2021) 1-14
In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework. ChainerRL implements a comprehensive set of DRL algorithms and techniques drawn from state-of-
Externí odkaz:
http://arxiv.org/abs/1912.03905
In this paper, we introduce and investigate a class P of continuous and periodic functions on R. The class P is defined so that second-order central differences of a function satisfy some concavity-type estimate. Although this definition seems to be
Externí odkaz:
http://arxiv.org/abs/1908.00888
Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaus
Externí odkaz:
http://arxiv.org/abs/1902.02992
Publikováno v:
In Journal of Cardiovascular Magnetic Resonance 16 February 2023 25(1)
Autor:
Clavera, Ignasi, Rothfuss, Jonas, Schulman, John, Fujita, Yasuhiro, Asfour, Tamim, Abbeel, Pieter
Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance
Externí odkaz:
http://arxiv.org/abs/1809.05214