Výsledky vyhledávání

Akademický článek

Isolation and typing of Clostridium botulinum from milk powder of an enterprise associated with a case of infant botulism

Autor: LUO Haipeng, QU Hongren, DING Bo, REN Xiu, ZHAO Linna, BAI Jichao, WANG Yaping, LIN Lan, CUI Shenghui

Publikováno v: Zhongguo shipin weisheng zazhi, Vol 35, Iss 10, Pp 1475-1481 (2023)

ObjectiveThe aim of this study was to detect botulinum toxin and Clostridium botulinum （C.botulinum） in 30 batches of infant formula milk powder obtained from an enterprise and to analyze the whole genome of the strain of C. botulinum type B isol

Externí odkaz: https://doaj.org/article/30d4cb5e20434fad9a47a75d7afcd132

Zobrazit plný text záznamu

Report

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

Autor: Luo, Haipeng, Sun, Qingfeng, Xu, Can, Zhao, Pu, Lin, Qingwei, Lou, Jianguang, Chen, Shifeng, Tang, Yansong, Chen, Weizhu

Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by

Externí odkaz: http://arxiv.org/abs/2407.10627

Zobrazit plný text záznamu

Report

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Autor: Cai, Yang, Farina, Gabriele, Grand-Clément, Julien, Kroer, Christian, Lee, Chung-Wei, Luo, Haipeng, Zheng, Weiqiang

Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-desc

Externí odkaz: http://arxiv.org/abs/2406.10631

Zobrazit plný text záznamu

Report

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Autor: Zhang, Mengxiao, Vuong, Ramiro Deo-Campo, Luo, Haipeng

We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-reg

Externí odkaz: http://arxiv.org/abs/2405.20678

Zobrazit plný text záznamu

Report

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Autor: Zhang, Mengxiao, Zhang, Yuheng, Luo, Haipeng, Mineiro, Paul

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal wi

Externí odkaz: http://arxiv.org/abs/2405.20677

Zobrazit plný text záznamu

Report

Optimal Multiclass U-Calibration Error and Beyond

Autor: Luo, Haipeng, Senapati, Spandan, Sharan, Vatsal

We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneou

Externí odkaz: http://arxiv.org/abs/2405.19374

Zobrazit plný text záznamu

Report

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

Autor: Cassel, Asaf, Luo, Haipeng, Rosenberg, Aviv, Sotnikov, Dmitry

In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL with Aggregat

Externí odkaz: http://arxiv.org/abs/2405.07637

Zobrazit plný text záznamu

Report

On Tractable $\Phi$-Equilibria in Non-Concave Games

Autor: Cai, Yang, Daskalakis, Constantinos, Luo, Haipeng, Wei, Chen-Yu, Zheng, Weiqiang

While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are n

Externí odkaz: http://arxiv.org/abs/2403.08171

Zobrazit plný text záznamu

Report

Contextual Multinomial Logit Bandits with General Value Functions

Autor: Zhang, Mengxiao, Luo, Haipeng

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applica

Externí odkaz: http://arxiv.org/abs/2402.08126

Zobrazit plný text záznamu

Report

Efficient Contextual Bandits with Uninformed Feedback Graphs

Autor: Zhang, Mengxiao, Zhang, Yuheng, Luo, Haipeng, Mineiro, Paul

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual version of

Externí odkaz: http://arxiv.org/abs/2402.08127

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání