Zobrazeno 1 - 10
of 46 673
pro vyhledávání: '"Evaluation of Learning"'
Autor:
Shimizu, Tatsuhiro, Tanaka, Koichi, Kishimoto, Ren, Kiyohara, Haruka, Nomura, Masahiro, Saito, Yuta
We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (be
Externí odkaz:
http://arxiv.org/abs/2408.11202
Autor:
Hsieh, Po-Yu, Hou, June-Hao
The control and modeling of robot dynamics have increasingly adopted model-free control strategies using machine learning. Given the non-linear elastic nature of bionic robotic systems, learning-based methods provide reliable alternatives by utilizin
Externí odkaz:
http://arxiv.org/abs/2407.02428
Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the lo
Externí odkaz:
http://arxiv.org/abs/2404.15691
Autor:
Jiang, Yifan, Zhang, Jiarui, Sun, Kexuan, Sourati, Zhivar, Ahrabian, Kian, Ma, Kaixin, Ilievski, Filip, Pujara, Jay
While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract vi
Externí odkaz:
http://arxiv.org/abs/2404.13591
In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations t
Externí odkaz:
http://arxiv.org/abs/2402.14664
Autor:
Mukherjee, Debarshi1
Publikováno v:
3D: IBA Journal of Management & Leadership. Jul-Dec2024, Vol. 16 Issue 1, p62-74. 13p.
Autor:
Zhang, Yi, Imai, Kosuke
While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased pol
Externí odkaz:
http://arxiv.org/abs/2311.02467
For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in
Externí odkaz:
http://arxiv.org/abs/2311.01828
Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the e
Externí odkaz:
http://arxiv.org/abs/2309.08748