Zobrazeno 1 - 10
of 61
pro vyhledávání: '"Zhu, Ruihao"'
Reward learning plays a pivotal role in Reinforcement Learning from Human Feedback (RLHF), ensuring the alignment of language models. The Bradley-Terry (BT) model stands as the prevalent choice for capturing human preferences from datasets containing
Externí odkaz:
http://arxiv.org/abs/2410.05328
Motivated by the concept of satisficing in decision-making, we consider the problem of satisficing exploration in bandit optimization. In this setting, the learner aims at selecting satisficing arms (arms with mean reward exceeding a certain threshol
Externí odkaz:
http://arxiv.org/abs/2406.06802
Motivated by the importance of explainability in modern machine learning, we design bandit algorithms that are efficient and interpretable. A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown m
Externí odkaz:
http://arxiv.org/abs/2310.14751
Among creative professionals, Generative Artificial Intelligence (GenAI) has sparked excitement over its capabilities and fear over unanticipated consequences. How does GenAI impact User Experience Design (UXD) practice, and are fears warranted? We i
Externí odkaz:
http://arxiv.org/abs/2309.15237
Motivated by the prevalence of ``price protection guarantee", which allows a customer who purchased a product in the past to receive a refund from the seller during the so-called price protection period (typically defined as a certain time window aft
Externí odkaz:
http://arxiv.org/abs/2211.01798
The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory d
Externí odkaz:
http://arxiv.org/abs/2211.04586
Autor:
Zhu, Ruihao
Rapid development of data science technologies have enabled data-driven algorithms for many important operational problems. Existing data-driven solutions often requires the operational environments being stationary. However, recent examples have sho
Motivated by practical considerations in machine learning for financial decision-making, such as risk aversion and large action space, we consider risk-aware bandits optimization with applications in smart order routing (SOR). Specifically, based on
Externí odkaz:
http://arxiv.org/abs/2208.02389
Quadruped robots can traverse a multitude of terrains with greater ease when compared to wheeled robots. Traditional rigid quadruped robots possess severe limitations as they lack structural compliance. Most of the existing soft quadruped robots are
Externí odkaz:
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286348
Publikováno v:
In Carbohydrate Polymer Technologies and Applications December 2024 8