Low-degree learning and the metric entropy of polynomials

Autor: Alexandros Eskenazis, Paata Ivanisvili, Lauritz Streck
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Discrete Analysis (2023)
Druh dokumentu: article
ISSN: 2397-3129
DOI: 10.19086/da.88507
Popis: Low-degree learning and the metric entropy of polynomials, Discrete Analysis 2023:17, 23 pp. This paper is a follow-up to a paper by the first two authors on the general problem of designing algorithms that can efficiently learn functions defined on the Boolean cube $\{-1,1\}^n$, given suitable conditions on the functions. Given a function $f$, the rough idea is to take a series of sample points, calculate $f$ at those points, and guess a function $g$ that is close to $f$. Obviously, without strong information about $f$, nothing non-trivial can be done. If, for example, the values of $f$ are chosen uniformly and independently from $[-1,1]$, then one learns nothing from the sample values other than the sample values themselves. However, many functions that one actually wants to learn are far from arbitrary, so one can try to identify suitable properties that make them amenable to learning from a far smaller number of samples. One set-up considered by the authors is where $f:\{-1,1\}^n\to[-1,1]$ is a low-degree polynomial and the samples are taken randomly. Since $x^2=1$ when $x\in\{-1,1\}$, every polynomial function $f$ of degree $d$ defined on the Boolean cube has a formula of the form $f(x)=\sum_{|A|\leq d}\lambda_A\prod_{i\in A}x_i$, where $A$ ranges over subsets of $\{1,2,\dots,n\}$. The functions $x\mapsto\prod_{i\in A}x_i$ are the Walsh functions, and the $\lambda_A$ are therefore the Fourier coefficients of $f$ that correspond to sets $A$ of size at most $d$ -- the size of $A$ is often referred to as the degree of the Walsh function, and one often refers informally to "the Fourier coefficients of degree at most $d$" or "the degree-$d$ part of the Fourier expansion". From simple dimension considerations, one sees easily that one cannot hope to determine a degree-$d$ polynomial $f$ exactly from fewer than $\sum_{r=0}^d\binom nr$ samples (even if it is required to take values in $[-1,1]$), so the aim is to determine it _approximately_. In this paper, and in the previous paper of the first two authors, the approximation considered is an $L_2$ one. Thus, the aim is to find an algorithm that does the following: given a function $f:\{-1,1\}\to[-1,1]$ of degree at most $d$, it takes $m$ random samples $(x,f(x))$, and outputs a function $g$ such that $\|g-f\|_2
Databáze: Directory of Open Access Journals