Theory of statistical learning and application

Autor: Babić, Marija
Přispěvatelé: Basrak, Bojan
Jazyk: chorvatština
Rok vydání: 2021
Předmět:
Popis: Na početku rada definiramo najjednostavniju vrstu učenja koja se naziva minimizacija empirijskog rizika (ERM algoritam). Nakon toga definiramo glavni model za učenje, PAC model, koji se bazira na RA pretpostavci, te njegovu agnostičku varijantu koja ne zahtijeva tu istu pretpostavku. Uvođenjem pojma generalizirane funkcije gubitka, generalizirali smo model PAC učenja kako bismo ga mogli koristiti na široj skupini zadataka za učenje, a ne samo za binarnu klasifikaciju. U središnjem dijelu rada dani su glavni teoremi. No Free Lunch teorem nam govori da ako nemamo nikakvih dodatnih pretpostavki na klasu hipoteza, ne postoji algoritam kojim možemo uspješno riješiti sve probleme učenja. To znači da svaki problem zahtijeva posebnu analizu, a pri biranju klase hipoteza veliki naglasak stavljamo na pronalazak ravnoteže izmedu greške aproksimacije i greške procjene, što je u literaturi poznato kao biascomplexity tradeo. Korištenjem Sauer-Shelah-Perles leme i Massartove leme dokazali smo glavni teorem ovog rada, Fundamentalni teorem statističkog učenja, kojim smo povezali pojmove uniformne konvergencije, (agnostičkog) PAC učenja, ERM algoritma i VC dimenzije za probleme binarne klasifikacije. Glavna poruka je da klasu hipoteza možemo naučiti u (agnostičkom) PAC smislu ako i samo ako ima konačnu VC dimenziju. Kvantitativna verzija tog teorema nam daje gornje i donje ograde na složenost učenja koje, između ostalog, ovise o VC dimenziji promatrane klase hipoteza. U posljednjem dijelu rada usredotočili smo se na dva algoritma za probleme binarne klasifikacije. Prvi od njih koristi klasu poluprostora, a spada u jednu od najkorištenijih familija klasa hipoteza - linearne prediktore. Korištenjem Radonovog teorema dokazali smo konačnost VC dimenzije poluprostora, a nakon toga smo se usredotočili na Perceptron algoritam, iterativnu verziju ERM algoritma. Drugi algoritam koji smo opisali je AdaBoost algoritam koji spada u Boosting algoritme, a kao rezultat daje hipotezu koja ovisi o linearnoj kombinaciji nekih jednostavnih hipoteza. In the beginning of this thesis, we define the simplest learning model called the Empirical Risk Minimisation (ERM) model. After that we define a formal learning model, the PAC model, which relies on the RA assumption, and its agnostic variant in which this assumption is omitted. By defining the generalised loss function we improved the PAC model so we can use it, not only in the context of binary classification, but also on a wider range of learning problems. The main theorems are given in the central part of the thesis. The No Free Lunch theorem states that, if we don’t have any additional assumptions on the hypothesis class, no learner can succeed on all learning tasks. This means that every learning task requires individual analysis and when choosing the hypothesis class, we have to pay attention on the balance between the approximation and estimation errors, which is known as the biascomplexity tradeo. Using the Sauer-Shelah-Perles lemma and Massart lemma, we proved the key theorem of this thesis, the Fundamental Theorem of Statistical Learning, which reveals the connection between the notions of uniform convergence, (agnostic) PAC learning, ERM rule and VC dimension for binary classification problems. The theorem states that a hypothesis class is (agnostic) PAC learnable if and only if its VC dimension is finite. The quantitative version of the same theorem gives us the upper and lower bounds on the sample complexity of learning, which, among other things, depend on the VC dimension of the corresponding hypothesis class. In the last part of the thesis, we focus on two algorithms for binary classification problems. First of them uses the hypothesis class of halfspaces which belongs to the family of linear predictors, one of the most often used family of hypothesis classes. Using the Radon theorem we showed that the VC dimension of halfspaces is finite and after that we focused on the Perceptron algorithm, an iterative version of the ERM model. The second algorithm is one of many Boosting algorithms called the AdaBoost algorithm. This algorithm outputs a hypothesis that depends on a linear combination of certain simple hypothesis.
Databáze: OpenAIRE