Zobrazeno 1 - 10
of 282
pro vyhledávání: '"LUO Haipeng"'
Autor:
LUO Haipeng, QU Hongren, DING Bo, REN Xiu, ZHAO Linna, BAI Jichao, WANG Yaping, LIN Lan, CUI Shenghui
Publikováno v:
Zhongguo shipin weisheng zazhi, Vol 35, Iss 10, Pp 1475-1481 (2023)
ObjectiveThe aim of this study was to detect botulinum toxin and Clostridium botulinum (C.botulinum) in 30 batches of infant formula milk powder obtained from an enterprise and to analyze the whole genome of the strain of C. botulinum type B isol
Externí odkaz:
https://doaj.org/article/30d4cb5e20434fad9a47a75d7afcd132
Autor:
Luo, Haipeng, Sun, Qingfeng, Xu, Can, Zhao, Pu, Lin, Qingwei, Lou, Jianguang, Chen, Shifeng, Tang, Yansong, Chen, Weizhu
Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by
Externí odkaz:
http://arxiv.org/abs/2407.10627
Autor:
Cai, Yang, Farina, Gabriele, Grand-Clément, Julien, Kroer, Christian, Lee, Chung-Wei, Luo, Haipeng, Zheng, Weiqiang
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-desc
Externí odkaz:
http://arxiv.org/abs/2406.10631
We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-reg
Externí odkaz:
http://arxiv.org/abs/2405.20678
Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal wi
Externí odkaz:
http://arxiv.org/abs/2405.20677
We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneou
Externí odkaz:
http://arxiv.org/abs/2405.19374
In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL with Aggregat
Externí odkaz:
http://arxiv.org/abs/2405.07637
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are n
Externí odkaz:
http://arxiv.org/abs/2403.08171
Autor:
Zhang, Mengxiao, Luo, Haipeng
Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applica
Externí odkaz:
http://arxiv.org/abs/2402.08126
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual version of
Externí odkaz:
http://arxiv.org/abs/2402.08127