Statistical learning for predictive targeting in online advertising

Autor: Fruergaard, Bjarne Ørum
Jazyk: angličtina
Rok vydání: 2015
Zdroj: Fruergaard, B Ø 2015, Statistical learning for predictive targeting in online advertising . DTU Compute PHD-2014, no. 355, Technical University of Denmark, Kgs. Lyngby .
Popis: I denne PhD afhandling undersøger vi metoder indenfor maskinlæring med anvendelser indenfor datadrevet markedsføring (”computational advertising”). ”Computational advertising” er en bred disciplin, der spænder over metoder, som bruges til målrettet markedsføring på internettet. Sådanne systemer bygger på algoritmer til automatisk at kunne træffe beslutninger. Det er særligt indenfor visning af web banner-reklamer, at vi i denne afhandling undersøger metoder i maskinlæring for at optimere beslutningsprocessen.PhD projektet er et erhvervssamarbejde med Adform, som er leverandør af en digital markedsføringsplatform. Dette har indflydelse på de analyser og metoder, vi undersøger, da de bør kunne anvendes i Adforms systemer. Derfor har vi også ekstra fokus på skalérbare og højtydende metoder.Den konkrete anvendelse, som bruges til at benchmarke vores resultater, er forudsigelse af klik-rater. I denne anvendelse er vi interesserede i at estimere sandsynligheden for, at en bruger vil klikke på en given reklame. Dette baseres på informationer om brugeren, reklamen, en kontekst, samt andre signaler, så som tid. Dette finder særligt anvendelse i online auktioner, hvor annoncører byder i realtid for at vinde retten til at vise netop deres reklame.Bidragene i denne afhandling omfatter anvendelsen af en hybrid model, som indeholder både direkte og latente informationer, til estimering af klik-rater, og som er en udvidelse af den nuværende model i produktion hos Adform. Vores resultater bekræfter, at latente informationer kan læres ud fra data, og at de kan forbedre estimation af klik-rater.Vi introducerer også variatoner af Bayesianske generative modeller til stokastisk blokmodellering af profiler baseret på besøgshistorikker. For at forbedre estimeringen af klik-rater, kan vi følge en procedure i to skridt; 1) først trænes profiler fra besøgshistorik, og 2) dernæst kan profilerne bruges som ekstra informationer i en model til estimering af klik-rater. Vi viser empirisk hvordan dette også bidrager til bedre estimering af klik-rater.Til slut introducerer vi en ny model og metode til at detektere overlappende grupper fra observerede netværk. Modellen, som vi kalder ”multiple-networks stochastic blockmodeling”, fungerer under den antagelse, at det observerede netværk kan ses som en aggregering af mange delnetværk af simpel blokstruktur. The focus in this thesis is investigation of machine learning methods with applications in computational advertising. Computational advertising is the broad discipline of building systems which can reach audiences browsing the Internet with targeted advertisements. At the core of such systems, algorithms are needed for making decisions. It is in one such particular instance of computational advertising, namely in web banner advertising, that we investigate machine learning methods to assist and make decisions in order to optimize the placements of ads. The industrial partner in this work is Adform, an international online advertising technology partner. This also means that the analyses and methods in this work are developed with particular use-cases within Adform in mind and thus need also to be applicable in Adform’s technology stack. This implies extra thought on scalability and performance.The particular use-case which is used as a benchmark for our results, is clickthrough rate prediction. In this task one aims to predict the probability that a user will click on an advertisement, based on attributes about the user, the advertisement the context, and other signals, such as time. This has its main application in real-time bidding ad exchanges, where each advertiser is given a chance to place bids for showing their ad while the page loads, and the winning bid gets to display their banner. The contributions of this thesis entail application of a hybrid model of explicit and latent features for learning probabilities of clicks, which is a methodological extension of the current model in production at Adform. Our findings confirm that latent features can increase predictive performance in the setup of click-through rate prediction. They also reveal a tedious process for tuning the model for optimal performance.We also present variations of Bayesian generative models for stochastic blockmodeling for inference of structure based on browsing patterns. Applying this structural information to improve click-through rate prediction becomes a two-step procedure; 1) learn user and URL profiles from browsing patterns, 2) use the profiles as additional features in a click-through rate prediction model. The assumption we implicitly make is reasonable: Users and URLs that are grouped together based on browsing patterns will have similar responses to ads, e.g., can be used as predictors of clicks. We report successful examples of applying this approach in practice. Finally, we introduce the multiple-networks stochastic blockmodel (MNSBM), a model for efficient overlapping community detection in complex networks which can be assumed to be an aggregation of multiple block-structured subnetworks.
Databáze: OpenAIRE