Zobrazeno 1 - 10
of 742
pro vyhledávání: '"Zhou, Yuhao"'
With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it
Externí odkaz:
http://arxiv.org/abs/2405.00428
Large-scale coronal plasma evolutions can be adequately described by magnetohydrodynamics (MHD) equations. However, full multi-dimensional MHD simulations require substantial computational resources. Given the low plasma $\beta$ in the solar corona,
Externí odkaz:
http://arxiv.org/abs/2404.17056
Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling
Externí odkaz:
http://arxiv.org/abs/2404.15766
Deep neural networks (DNNs) are notoriously vulnerable to adversarial attacks that place carefully crafted perturbations on normal examples to fool DNNs. To better understand such attacks, a characterization of the features carried by adversarial exa
Externí odkaz:
http://arxiv.org/abs/2403.16176
Knowledge graphs have garnered significant research attention and are widely used to enhance downstream applications. However, most current studies mainly focus on static knowledge graphs, whose facts do not change with time, and disregard their dyna
Externí odkaz:
http://arxiv.org/abs/2403.04782
Temporal knowledge graph completion (TKGC) aims to fill in missing facts within a given temporal knowledge graph at a specific time. Existing methods, operating in real or complex spaces, have demonstrated promising performance in this task. This pap
Externí odkaz:
http://arxiv.org/abs/2403.02355
Autor:
Xi, Zhiheng, Chen, Wenxiang, Hong, Boyang, Jin, Senjie, Zheng, Rui, He, Wei, Ding, Yiwen, Liu, Shichun, Guo, Xin, Wang, Junzhe, Guo, Honglin, Shen, Wei, Fan, Xiaoran, Zhou, Yuhao, Dou, Shihan, Wang, Xiao, Zhang, Xinbo, Sun, Peng, Gui, Tao, Zhang, Qi, Huang, Xuanjing
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challe
Externí odkaz:
http://arxiv.org/abs/2402.05808
Autor:
Dou, Shihan, Liu, Yan, Jia, Haoxiang, Xiong, Limao, Zhou, Enyu, Shen, Wei, Shan, Junjie, Huang, Caishuang, Wang, Xiao, Fan, Xiaoran, Xi, Zhiheng, Zhou, Yuhao, Ji, Tao, Zheng, Rui, Zhang, Qi, Huang, Xuanjing, Gui, Tao
The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation qu
Externí odkaz:
http://arxiv.org/abs/2402.01391
Autor:
Fan, Xiaoran, Ji, Tao, Jiang, Changhao, Li, Shuo, Jin, Senjie, Song, Sirui, Wang, Junke, Hong, Boyang, Chen, Lu, Zheng, Guodong, Zhang, Ming, Huang, Caishuang, Zheng, Rui, Xi, Zhiheng, Zhou, Yuhao, Dou, Shihan, Ye, Junjie, Yan, Hang, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting comp
Externí odkaz:
http://arxiv.org/abs/2401.17221
Autor:
Wang, Binghai, Zheng, Rui, Chen, Lu, Liu, Yan, Dou, Shihan, Huang, Caishuang, Shen, Wei, Jin, Senjie, Zhou, Enyu, Shi, Chenyu, Gao, Songyang, Xu, Nuo, Zhou, Yuhao, Fan, Xiaoran, Xi, Zhiheng, Zhao, Jun, Wang, Xiao, Ji, Tao, Yan, Hang, Shen, Lixing, Chen, Zhan, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang
Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for
Externí odkaz:
http://arxiv.org/abs/2401.06080