Zobrazeno 1 - 10
of 42
pro vyhledávání: '"Zhu, Jiangcheng"'
Autor:
AI, 01., Young, Alex, Chen, Bei, Li, Chao, Huang, Chengen, Zhang, Ge, Zhang, Guanwei, Li, Heng, Zhu, Jiangcheng, Chen, Jianqun, Chang, Jing, Yu, Kaidong, Liu, Peng, Liu, Qiang, Yue, Shawn, Yang, Senbin, Yang, Shiming, Yu, Tao, Xie, Wen, Huang, Wenhao, Hu, Xiaohui, Ren, Xiaoyi, Niu, Xinyao, Nie, Pengcheng, Xu, Yuchi, Liu, Yudong, Wang, Yue, Cai, Yuxuan, Gu, Zhenyu, Liu, Zhiyuan, Dai, Zonghong
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long
Externí odkaz:
http://arxiv.org/abs/2403.04652
Autor:
Li, Yang, Xiong, Kun, Zhang, Yingping, Zhu, Jiangcheng, Mcaleer, Stephen, Pan, Wei, Wang, Jun, Dai, Zonghong, Yang, Yaodong
This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records
Externí odkaz:
http://arxiv.org/abs/2308.04719
Autor:
Song, Yan, Jiang, He, Tian, Zheng, Zhang, Haifeng, Zhang, Yingping, Zhu, Jiangcheng, Dai, Zonghong, Zhang, Weinan, Wang, Jun
Publikováno v:
Machine Intelligence Research (2024)
Few multi-agent reinforcement learning (MARL) research on Google Research Football (GRF) focus on the 11v11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this w
Externí odkaz:
http://arxiv.org/abs/2305.09458
Autor:
Fan, Jiajun, Zhuang, Yuzheng, Liu, Yuecheng, Hao, Jianye, Wang, Bin, Zhu, Jiangcheng, Wang, Hao, Xia, Shu-Tao
The exploration problem is one of the main challenges in deep reinforcement learning (RL). Recent promising works tried to handle the problem with population-based methods, which collect samples with diverse behaviors derived from a population of dif
Externí odkaz:
http://arxiv.org/abs/2305.05239
Cooperative multi-agent reinforcement learning (MARL) has made prominent progress in recent years. For training efficiency and scalability, most of the MARL algorithms make all agents share the same policy or value network. However, in many complex m
Externí odkaz:
http://arxiv.org/abs/2205.02561
Due to the partial observability and communication constraints in many multi-agent reinforcement learning (MARL) tasks, centralized training with decentralized execution (CTDE) has become one of the most widely used MARL paradigms. In CTDE, centraliz
Externí odkaz:
http://arxiv.org/abs/2203.08412
Autor:
Zhao, Jian, Yang, Mingyu, Zhao, Youpeng, Hu, Xunhan, Zhou, Wengang, Zhu, Jiangcheng, Li, Houqiang
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce
Externí odkaz:
http://arxiv.org/abs/2202.10134
Autor:
Zhao, Jian, Zhang, Yue, Hu, Xunhan, Wang, Weixun, Zhou, Wengang, Hao, Jianye, Zhu, Jiangcheng, Li, Houqiang
In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards. In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contribution
Externí odkaz:
http://arxiv.org/abs/2202.04427
Autor:
Mguni, David Henry, Jafferjee, Taher, Wang, Jianhong, Slumbers, Oliver, Perez-Nieves, Nicolas, Tong, Feifei, Yang, Li, Zhu, Jiangcheng, Yang, Yaodong, Wang, Jun
Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agent systems, coordinated exploration and behaviour is critical for agents to jointly achieve optimal outcomes. In this paper, we introduce a new general
Externí odkaz:
http://arxiv.org/abs/2112.02618
Safety has become one of the main challenges of applying deep reinforcement learning to real world systems. Currently, the incorporation of external knowledge such as human oversight is the only means to prevent the agent from visiting the catastroph
Externí odkaz:
http://arxiv.org/abs/2111.05819