Výsledky vyhledávání - "Kuzborskij, Ilja"

Report

Autor: Yadkori, Yasin Abbasi, Kuzborskij, Ilja, György, András, Szepesvári, Csaba

We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from

Externí odkaz: http://arxiv.org/abs/2406.02543

Zobrazit plný text záznamu

Report

Mitigating LLM Hallucinations via Conformal Abstention

Autor: Yadkori, Yasin Abbasi, Kuzborskij, Ilja, Stutz, David, György, András, Fisch, Adam, Doucet, Arnaud, Beloshapka, Iuliya, Weng, Wei-Hung, Yang, Yao-Yuan, Szepesvári, Csaba, Cemgil, Ali Taylan, Tomasev, Nenad

We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answ

Externí odkaz: http://arxiv.org/abs/2405.01563

Zobrazit plný text záznamu

Report

Better-than-KL PAC-Bayes Bounds

Autor: Kuzborskij, Ilja, Jun, Kwang-Sung, Wu, Yulian, Jang, Kyoungseok, Orabona, Francesco

Let $f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, \dots, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data

Externí odkaz: http://arxiv.org/abs/2402.09201

Zobrazit plný text záznamu

Report

Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation

Autor: Deng, Yuyang, Kuzborskij, Ilja, Mahdavi, Mehrdad

We consider the problem of learning a model from multiple heterogeneous sources with the goal of performing well on a new target distribution. The goal of learner is to mix these data sources in a target-distribution aware way and simultaneously mini

Externí odkaz: http://arxiv.org/abs/2309.10736

Zobrazit plný text záznamu

Report

Tighter PAC-Bayes Bounds Through Coin-Betting

Autor: Jang, Kyoungseok, Jun, Kwang-Sung, Kuzborskij, Ilja, Orabona, Francesco

We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $

Externí odkaz: http://arxiv.org/abs/2302.05829

Zobrazit plný text záznamu

Report

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

Autor: Kuzborskij, Ilja, Szepesvári, Csaba

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural ne

Externí odkaz: http://arxiv.org/abs/2212.13848

Zobrazit plný text záznamu

Report

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

Autor: Richards, Dominic, Kuzborskij, Ilja

We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions. In particular, we show oracle type bounds which reveal tha

Externí odkaz: http://arxiv.org/abs/2107.12723

Zobrazit plný text záznamu

Report

On the Role of Optimization in Double Descent: A Least Squares Study

Autor: Kuzborskij, Ilja, Szepesvári, Csaba, Rivasplata, Omar, Rannen-Triki, Amal, Pascanu, Razvan

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to

Externí odkaz: http://arxiv.org/abs/2107.12685

Zobrazit plný text záznamu

Report

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

Autor: Kuzborskij, Ilja, Szepesvári, Csaba

We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noisy labels, neural networks

Externí odkaz: http://arxiv.org/abs/2107.05341

Zobrazit plný text záznamu

Report

A Distribution-Dependent Analysis of Meta-Learning

Autor: Konobeev, Mikhail, Kuzborskij, Ilja, Szepesvári, Csaba

A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution. In this paper, focusing on fixed design lin

Externí odkaz: http://arxiv.org/abs/2011.00344

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání