Zobrazeno 1 - 10
of 148
pro vyhledávání: '"Chen, Ningyuan"'
Intensity control is a type of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we adapt the reinforcement learning framework to intensity
Externí odkaz:
http://arxiv.org/abs/2406.05358
In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decisio
Externí odkaz:
http://arxiv.org/abs/2406.02426
When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game''
Externí odkaz:
http://arxiv.org/abs/2405.17463
Autor:
Chen, Ningyuan, Li, Wenhao
We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportio
Externí odkaz:
http://arxiv.org/abs/2306.16578
Commercial AI solutions provide analysts and managers with data-driven business intelligence for a wide range of decisions, such as demand forecasting and pricing. However, human analysts may have their own insights and experiences about the decision
Externí odkaz:
http://arxiv.org/abs/2211.11028
Product bundling is a common selling mechanism used in online retailing. To set profitable bundle prices, the seller needs to learn consumer preferences from the transaction data. When customers purchase bundles or multiple products, classical method
Externí odkaz:
http://arxiv.org/abs/2209.04942
Autor:
Chen, Ningyuan, Yang, Shuoguang
In the multi-armed bandit framework, there are two formulations that are commonly employed to handle time-varying reward distributions: adversarial bandit and nonstationary bandit. Although their oracles, algorithms, and regret analysis differ signif
Externí odkaz:
http://arxiv.org/abs/2201.01628
It has been recently shown in the literature that the sample averages from online learning experiments are biased when used to estimate the mean reward. To correct the bias, off-policy evaluation methods, including importance sampling and doubly robu
Externí odkaz:
http://arxiv.org/abs/2108.00236
We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infini
Externí odkaz:
http://arxiv.org/abs/2107.03635
In prescriptive analytics, the decision-maker observes historical samples of $(X, Y)$, where $Y$ is the uncertain problem parameter and $X$ is the concurrent covariate, without knowing the joint distribution. Given an additional covariate observation
Externí odkaz:
http://arxiv.org/abs/2106.05724