Zobrazeno 1 - 10
of 53
pro vyhledávání: '"Kuzborskij, Ilja"'
We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from
Externí odkaz:
http://arxiv.org/abs/2406.02543
Autor:
Yadkori, Yasin Abbasi, Kuzborskij, Ilja, Stutz, David, György, András, Fisch, Adam, Doucet, Arnaud, Beloshapka, Iuliya, Weng, Wei-Hung, Yang, Yao-Yuan, Szepesvári, Csaba, Cemgil, Ali Taylan, Tomasev, Nenad
We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answ
Externí odkaz:
http://arxiv.org/abs/2405.01563
Let $f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, \dots, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data
Externí odkaz:
http://arxiv.org/abs/2402.09201
We consider the problem of learning a model from multiple heterogeneous sources with the goal of performing well on a new target distribution. The goal of learner is to mix these data sources in a target-distribution aware way and simultaneously mini
Externí odkaz:
http://arxiv.org/abs/2309.10736
We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $
Externí odkaz:
http://arxiv.org/abs/2302.05829
Autor:
Kuzborskij, Ilja, Szepesvári, Csaba
We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural ne
Externí odkaz:
http://arxiv.org/abs/2212.13848
Autor:
Richards, Dominic, Kuzborskij, Ilja
We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions. In particular, we show oracle type bounds which reveal tha
Externí odkaz:
http://arxiv.org/abs/2107.12723
Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to
Externí odkaz:
http://arxiv.org/abs/2107.12685
Autor:
Kuzborskij, Ilja, Szepesvári, Csaba
We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noisy labels, neural networks
Externí odkaz:
http://arxiv.org/abs/2107.05341
A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution. In this paper, focusing on fixed design lin
Externí odkaz:
http://arxiv.org/abs/2011.00344