Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Feng, Guhao"'
Autor:
Feng, Guhao, Yang, Kai, Gu, Yuntian, Ai, Xinyue, Luo, Shengjie, Sun, Jiacheng, He, Di, Li, Zhenguo, Wang, Liwei
Despite the remarkable success of Transformer-based Large Language Models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical ana
Externí odkaz:
http://arxiv.org/abs/2410.13857
Autor:
Zhong, Han, Feng, Guhao, Xiong, Wei, Cheng, Xinle, Zhao, Li, He, Di, Bian, Jiang, Wang, Liwei
In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite th
Externí odkaz:
http://arxiv.org/abs/2404.18922
Autor:
Yang, Kai, Ackermann, Jan, He, Zhenyu, Feng, Guhao, Zhang, Bohang, Feng, Yunzhen, Ye, Qiwei, He, Di, Wang, Liwei
As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Trans
Externí odkaz:
http://arxiv.org/abs/2402.13934
Autor:
He, Zhenyu, Feng, Guhao, Luo, Shengjie, Yang, Kai, Wang, Liwei, Xu, Jingjing, Zhang, Zhi, Yang, Hongxia, He, Di
In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encodi
Externí odkaz:
http://arxiv.org/abs/2401.16421
Autor:
Feng, Guhao, Zhong, Han
Reinforcement Learning (RL) encompasses diverse paradigms, including model-based RL, policy-based RL, and value-based RL, each tailored to approximate the model, optimal policy, and optimal value function, respectively. This work investigates the pot
Externí odkaz:
http://arxiv.org/abs/2312.17248
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empir
Externí odkaz:
http://arxiv.org/abs/2305.15408
Recently, subgraph GNNs have emerged as an important direction for developing expressive graph neural networks (GNNs). While numerous architectures have been proposed, so far there is still a limited understanding of how various design paradigms diff
Externí odkaz:
http://arxiv.org/abs/2302.07090